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INTRODUCTION 


The new inhibitors of the ErbB2 tyrosine kinase domain are potential lead compounds 
against the most aggressive forms of breast cancer. To inhibit the catalytic domain of the 
receptor, we intended to crystallize the tyrosine kinase domain of the receptor and 
identify new lead candidates through a new virtual ligand screening procedure. The key 
component and prerequisite of this technology is the three-dimensional model of the 
target domain. This structure can either be obtained through an X-ray crystallography 
experiment, the best case scenario, or through model building by homology to other 
know structures. The next step is to improve the flexible docking procedure and perform 
it in a high throughput manner to predict the binders of the target of interest. We have 
pursued all three directions: tried to crystallize the domain, built several models of ErbB2 
and worked on the optimization of parameters for the flexible docking calculations. 

BODY 

Task 1. To determine a high-resolution crystal structure of the tyrosine kinase domain 
ofErbB2 (ErbB2-TKD). 

The experimental determination of the ErbB2-TKD has been pursued by many 
laboratories and companies, but so far no one has reported a success. The main bottleneck 
is obtaining a reasonably large diffracting crystal. In that effort we focused on an attempt 
to take advantage of the new microstallization technology which allows one to try many 
different crystal conditions at a time. 

We have spent the past year developing high throughput technologies to aid in the 
structure determination of ErbB2. These technologies have been designed to allow for 
high throughput co-crystal structure determinations, specifically to allow one to 
determine a large number of inhibitor compounds that are predicted based on the virtual 
ligand screening. Once a series of compounds have been found to be potential inhibitors 
by virtual ligand screening and are confirmed by cell-based assays, we will be able to 
quickly place these compounds into the robotics system and accurately determine their 
binding site. 

Using technology developed at Lawrence Berkeley National Laboratory and relocated to 
the Genomics Institute for the Novartis Research Foundation last year, we have 
determined that we can reproducibly create 20 nanoliter droplets for protein 
crystallization trials. This current system evaluates a screen of 480 different 
crystallization conditions (thus requiring 9.6 microliters per screen at a typical protein 
concentration of 10 mgs/ml). In addition, we have developed an imaging system capable 
of imaging and analyzing up to 138,000 trials per day. Since ErbB2 is expressed in low 
quantities in a baculovirus expression system, this nanoliter volume technology for 
protein crystallization will be extremely useful for the crystallization of the protein target. 
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We have also designed a novel plate that uses the nanoliter volumes described above and 
is capable of being placed into a high throughput screening system and collecting 
compounds in the same location as the crystallization trial. This will allow one to quickly 
screen through potential inhibitors in a fast and efficient manner. The plates use a 
standard 96 well format, but have a unique design to allow them to both collect screened 
compounds but also conduct crystallization experiments. 

Task 2. To build structural models by homology for the kinase domains 

In the absence of the experimental three-dimensional structure of the ErbB2 TKD, we 
embarked on the homology study of this domain. There are several different kinase 
domains with known three-dimensional structure, but all of them fall in the weak zone of 
sequence similarity between 30 and 40% (Fig. 1). The domains include: the FGFR TK 
(fibroblast growth factor receptor), the insulin receptor and its mutant, the LCK kinase, 
SRC tyrosine kinase and haematopeotic cell kinase HCK (see Fig. 1) 

To estimate the error in the model we built the alignments and models starting from two 
different templates, the fibroblast growth factor receptor represented by the lfgk entry in 
the protein databank and the SRC tyrosine kinase lfmk entry. The first model was built 
on the basis of the alignment presented in Figure 2. In this alignment we built all the 
missing loops and where necessary the loops have been expanded and searched against 
an entire pdb database. 

Figure 3 compares the template and the model and shows the amino acids that are 
different between the two. 

The analysis of the active site alone shows that the SRC kinase is a slightly better 
template than the FGFR tyrosine kinase domain. The local residue conservation in the 
active site is higher for the SRC template. We have built another model of ErbB2 to be 
able to compare the two models built using different three-dimensional templates. 

Figure 4 presents a binding site of the model with docked ATP analogue. The binding site 
of ErbB2 is distinctly different from both the SRC kinase binding site (Figure 5) and the 
FGFR binding site. Both in SRC and FGFR domains tyrosine 340 (the number is taken 
from 2src structure) is replaced by leucine in ErbB2 domain. Another essential 
difference: alanine 403 in both template structures is replaced by threonine, and V561 of 
the FGFR structure is replaced by threonine. 

The differences we observed as a result of modeling can be used to design molecules 
specific to ErbB2 domain as opposed to the other kinases. The threonine substitutions 
can be specifically targeted by a number of hydrogen bond donors and acceptors of a lead 
compound. The leucine substitution leads to a large hydrophobic pocket that can also be 
targeted in our design. We are currently working on better ways to exploit small 
differences in the active sites. 

Improvements of the docking technology. 
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We have used docking technology from Molsoft. Several attempts have been made to 
estimate the accuracy of the procedure and validate the technology, which we intended to 
use to search for new lead compounds. The best validation of the technology was a real 
application of flexible docking to the design of RAR antagonists (Ref. 1) and agonists 
(Ref 2.). 

In manuscripts Ref. 3 and Ref. 4 we have demonstrated how the parameters of the 
docking and scoring functions can be optimized using the known protein-ligand 
complexes. In a separate effort we collaborated with the laboratory of Prof. Charles 
Brooks HI, who compared four different docking methods which could have been used 
for the virtual ligand screening step of this project. Our original approach, using ICM, has 
proved to be the best docking and screening procedure. 

KEY RESEARCH ACCOMPLISHMENTS 

1) The microcrystallization robotic system has been tested and optimized. The 
nanoliter crystallization will allow to try many different crystallization conditions 
for ErbB2 

2) The models were built for ErbB2 based on two different templates with known 
three-dimensional structures. 

3) The binding site has been compared between the ErbB2 and FGFR tyrosine 
kinases. 

4) The ICM docking procedure has been compared with other docking algorithms 
and is ready to be used for lead discovery. 

REPORTABLE OUTCOMES 

A web site with ErbB2 models has been created. 

The manuscript comparing different flexible docking methods and the virtual ligand 
screening algorithm has been prepared for publication (Ref. 5). 

CONCLUSIONS 

After preliminary work on microcrystallization we expect a reasonable chance to 
crystallize the ErbB2 domain, even though it remains extremely challenging. Using the 
above technology, we are now ready to begin crystallization trials. 

We are ready to start docking studies using the models of ErbB2. These studies should 
first explain the pattern of cross-reactivity with other tyrosine kinase ligands. 

In the event that we obtain a crystal structure we will be ready to use it immediately for 
virtual ligand screening. 
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Fig 1. 

All the tyrosine kinase domains with known three-dimensional structure with substantial 
similarity (about 30%) to ErbB2 domain. 
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Figure 2. The alignment used to build the model by homology. The alignment was 
generated by the sequence - structure alignment algorithm which takes the secondary 
structure and accessibility into account. The gaps have been expanded to allow for 
modeling of short loops. 
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Figure 3. A side-by-side view of the template fibroblast growth factor receptor tyrosine 
kinase domain and the ErbB2 model. In yellow are shown the regions of sequence 
identity. 
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Fig. 4. The binding site of the ErbB2 model built using the SRC kinase domain template. 
The molecular surface of the site is colored by physico-chemical properities: 
hydrophobicity (green), hydrogen bonding donor (blue), hydrogen bonding acceptor 


(red). 



Fig. 5 The binding site of the template tyrosine kinase domain (SRC kinase). This site is 
compared with the model of ErbB2 shown on Figure 4. The color code is the same as for 
Figure 4. 
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Nuclear hormone receptors (NRs) are potential targets for thera¬ 
peutic approaches to many clinical conditions, including cancer, 
diabetes, and neurological diseases. The crystal structure of the 
ligand binding domain of agonist-bound NRs enables the design of 
compounds with agonist activity. However, with the exception of 
the human estrogen receptor-a, the lack of antagonist-bound 
"inactive" receptor structures hinders the rational design of re¬ 
ceptor antagonists. In this study, we present a strategy for design¬ 
ing such antagonists. We constructed a model of the inactive 
conformation of human retinoic acid receptor-a by using informa¬ 
tion derived from antagonist-bound estrogen receptor-a and ap¬ 
plied a computer-based virtual screening algorithm to identify 
retinoic acid receptor antagonists. Thus, the currently available 
crystal structures of NRs may be used for the rational design of 
antagonists, which could lead to the development of novel drugs 
for a variety of diseases. 

M embers of the nuclear hormone receptor (NR) family are 
under the control of a wide variety of hormones and 
ligands, such as steroids, retinoids, thyroid hormone, 1,25- 
dihydroxy-vitamin D3, and prostanoids. Many of these NRs are 
potential targets for the therapy of a variety of diseases: antag¬ 
onists of estrogen receptor-a (ERa) (e.g., tamoxifen) are clin¬ 
ically used for the treatment of breast cancer (1) whereas retinoic 
acid receptor (RAR) agonists and antagonists block the growth 
of a number of neoplastic cells including breast tumor cells (2, 
3). Agonists for retinoid X receptors (RXRs) and peroxisome 
proliferator-activated receptor 7 (PPAR7) are potential candi¬ 
dates for use in the treatment of cancer and diabetes (PPAR7 is 
the receptor for the antidiabetic drug thiazolidinedione) (4-7), 
whereas Nurrl ligands may be useful for treatment of Parkin¬ 
son’s disease (8). Thus, designing molecules that selectively 
activate or inhibit specific NRs is of considerable biological 
significance and will likely have the potential for use in important 
clinical applications. 

The crystal structures of the ligand binding domain (LBD) of 
many members of the NR family recently have been solved, and 
the ligand-dependent structural changes involved in transcrip¬ 
tional activation have been clarified, enabling the structure- 
based design of specific agonists (9, 10). Recent studies on ERa 
also have shed light on the LBD structural changes mediated by 
NR antagonists (11, 12): ERa agonists and antagonists super¬ 
impose well and engage in a very similar network of hydrophobic 
and electrostatic contacts with the receptor. However, in the 
agonist-bound conformation, the C-terminal helix H12 sits like 
a lid on top of the ligand (11) (a similar observation was made 
for virtually all of the NR LBD structures solved so far; ref. 9). 
In contrast, the two ERa antagonists present a protruding arm 
that is not compatible with the “closed lid” conformation (11, 
12) (Fig. L4). As a result, helix H12 is pushed away from the 
ligand binding site and relocates in the coactivator-binding 
pocket of the receptor (Fig. IB) (11). Moreover, the LxxML 
motif (where L is a leucine, M a methionine, and x any residue) 
of the ERa helix H12 mimics, and probably competes with, a 
LxxLL helical peptide found in a wide variety of coactivator 
proteins. The alignment of the LBD of various NRs (13) suggests 


that a common structural mechanism would be for the antago¬ 
nists to induce the relocation of helix HI2 into the hydrophobic 
coactivator-binding groove of the receptor. The observation that 
the progesterone receptor antagonist RU486 superimposes with 
the natural hormone progesterone, but presents a protruding 
arm similar to that of tamoxifen (14,15) provides support for the 
universality of this mechanism of antagonistic activity. 

Our goal in this study is to provide further evidence for this 
hypothesis by building a model of the antagonist-bound confor¬ 
mation of RARa, a NR that plays an important role in the 
differentiation and proliferation of a wide variety of cell types 
and for which only the agonist bound conformation is known 
(16-18), and to rationally and rapidly identify new antagonists 
for this receptor. We built a model of the antagonist-bound 
structure of RAR, based on the ERa/tamoxifen complex (12). 
The model was used for the virtual screening of a database of 
^150,000 available compounds, and antagonist candidates were 
tested in vitro. Two novel antagonists and a novel agonist were 
discovered. The ligands were specific for RAR, confirming the 
validity of our model and the potential therapeutic application 
of our strategy. 

Materials and Methods 

Building of the Model of Antagonist-Bound RAR. A helical peptide 
PLIREMLENP corresponding to helix H12 of RARy was 
docked into the putative coactivator binding pocket of another 
RAR7 molecule. We hypothesized that the IxxML motif con¬ 
tacts the coactivator binding site of the receptor, and an auto¬ 
matic docking procedure was carried out toward this site, with 
flexible protein and peptide side chains, according to a biased 
probability Monte Carlo energy minimization procedure (19, 
20). Two critical features of the interaction between the LBDs of 
NRs and their coactivators were used to carry out the docking: 
(/) The “charge clamp,” initially observed in the complex 
between SRC-1 and peroxisome proliferator-activated receptor 
7 (21), where a conserved glutamate (E414 in R AR7) and lysine 
(K246 in RARy) at opposite ends of the hydrophobic cavity of 
the receptor contact the backbone of the coactivator’s LxxLL 
box, enabled the orientation of the helical peptide, (ii) The 
finding that the leucines of the LxxLL motif of SRC-1 are buried 
in the hydrophobic cavity of the receptor determines which side 
of the helix faces the receptor. Here, the isoleucine, methionine, 
and leucine of the IxxML motif were buried in the binding site 
of RARy. Loose distance restraints were set between the charge 
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Fig. 1. Modeling of the antagonist-bound structure of RAR. Agonist (white) 
and antagonist (cyan) superimpose in the binding pocket of ERa, but the 
antagonist presents an additional protruding arm that pushes helix 12 (HI2, 
green) away (A). As a result, H12 relocates in the coactivator binding pocket 
of the receptor (HI2, red) (B). Based on the ERa structure, helix H12 of RARy 
(red) was docked to the coactivator binding pocket of the RARy-LBD (critical 
hydrophobic residues are displayed in magenta) (Q, and the C terminus of the 
protein was remodeled from its agonist-bound conformation (green) to its 
antagonist-bound conformation (red) (D). 

clamp of the receptor (21) (i.e., E414 and K246) and backbone 
nitrogen and oxygens of the peptide (nitrogen of the isoleucine 
on one end, and carbonyl of the methionine, leucine, and 
asparagine in the MLEN motifs, respectively). The energy of the 
complex was minimized in the internal coordinate space by using 
the modified ECEPP/3 potentials. The subset of the variables 
minimized with the ICM method (19, 20,22,23) included the side 
chains of the receptor, six positional variables of the helix, and 
the side-chain torsion angles of the helix. 

After the icm docking procedure, we built a model of antag¬ 
onist-bound RARy. The structure of the receptor was kept rigid 
but for the side chains and backbone of the 25 C-terminal 
residues (corresponding to the last 10 residues of helix Hll, the 
loop from Hll to H12, and H12), and for the side chains of the 
putative coactivator binding site (within 6 A of the previously 
docked helical peptide). Tethers then were set between the C 
terminus of the receptor and the corresponding residues of the 
docked helical peptide, and the energy of the receptor was 
minimized by a stochastic global energy optimization in the 
internal coordinate space (22, 23). 

The last step was, from the resulting model of antagonist- 
bound RARy, to derive the structure of the antagonist-binding 
pocket of R ARa: the three nonidentical residues in the vicinity 
of the binding pocket (A234, M272, and A397) were changed to 
the RARa isoform (S234, 1272, and V397, respectively) and 
energy-minimized. Another possibility would have been to in¬ 
troduce the mutations before remodeling the C terminus of the 
receptor. We preferred to proceed as described here to preserve 
the integrity of the receptor during the critical remodeling of the 
C-terminal end. 

Receptor-Ligand Docking. An initial docking was carried out with 
a grid potential representation of the receptor and flexible 
ligand (24). The resulting conformation then was optimized with 
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a full atom representation of the receptor, flexible receptor side 
chains, and flexible ligand, by an ICM stochastic global optimi¬ 
zation algorithm as implemented in the MolSoft ICM 2.7 program 
(23, 24). 

Screening of a Virtual Library of Compounds. The flexible-ligand/ 
grid-potential-receptor docking algorithm (23, 24) was carried 
out automatically on the Available Chemicals Directory library 
of 153,000 available chemical compounds (MDL Information 
Systems, San Leandro, CA). The screening took less than a 
month on 10 194-MHz IP25 processors. Each compound was 
assigned a score, according to its fit with the receptor, which took 
into account continuum as well as discreet electrostatics, hydro- 
phobicity, and entropy parameters (25). The distribution of the 
compounds according to their score is presented at http:// 
abagyan.scripps.edu/PNAS/MS2000/. All compounds scoring 
better (i.e., lower) than -32 were screened further for the 
number of hydrogen bonds engaged with the receptor. The 134 
compounds that made at least two hydrogen bonds with the 
receptor were preselected. The 609 compounds scoring better 
than -37 also were preselected, regardless of the hydrogen 
bonding network. This preselection pool then was further min¬ 
imized with a full atom representation of the receptor, as 
described above. The quality of the fit of the 500 best-scoring 
compounds then was visually estimated, and 32 compounds were 
selected for biological testing. These compounds are not neces¬ 
sarily the ones with the best final scores, but the ones we thought, 
after careful visual inspection, presented the best characteristics, 
such as Van der Waals fit or hydrogen bonding (see http:// 
abagyan.scripps.edu/PNAS/MS2000/). 

It occurred to us that during the selection by the MolSoft 
virtual screening procedure, it was preferable to set up an initial 
cut-off value poorly selective (i.e., -32) to recover a large pool 
of preselected compounds and to apply to this pool subsequent 
screens specific for the system, such as number of hydrogen 
bonds (used here) or presence of a hydrogen bond acceptor (for 
example) at a specific point of space. As a result, we derived the 
value -32 as a good initial threshold (this value generates an 
initial pool of 3,000-4,000 compounds). 

Biological Activity of the Antagonist and Agonist Candidates. HeLa 
cells were transfected by calcium phosphate precipitation using 
1 /xg of the Gal4-responsive chloramphenicol acetyltransferase 
(CAT) reporter pMCllO and 1 fxg of Gal4-hRARa-LBD or 1 fig 
of Gal4-hRXR/3-LBD. Studies also were performed with the 
three wild-type hRAR isoforms (hRARa, hRAR/3, and 
I 1 RAR 7 ) by using a AMTV-IR-CAT reporter as described (26, 
27). Cell cultures were supplemented with indicated ligands 
immediately after addition of the calcium phosphate/DNA 
precipitate. Media and ligands were replaced after 24 h, and cells 
were harvested and essayed for CAT activity 24 h later. 

Results 

Modeling of the RAR Antagonist Binding Pocket. The x-ray structure 
of RARy bound to the agonist all-trans RA is available (18); 
however, the conformation of the receptor bound to an antag¬ 
onist is not known. We used the observations made from the 
structure of ERa bound to an agonist, 17/3-estradiol (11), and 
two antagonists, tamoxifen and raloxifene ( 11 , 12 ), to build a 
model of antagonist-bound RAR (Fig. 1 A and B ). We docked 
helix H12 of RARy into the putative coactivator binding pocket 
of the receptor as described (27) (see Materials and Methods for 
details) (Fig. 1C) and remodeled the 25 C-terminal residues, 
starting near the end of helix 11 , through an extensive global 
energy minimization procedure (Fig. ID). 

Docking of Known RAR Antagonists into the Modeled Receptor. A few 

RAR antagonists have been described in the literature; and 
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Fig. 2. RAR antagonists. Two known antagonists (A and B) and two novel 
antagonists (Cand D). (Left) Chemical structure. (Right) Conformation docked 
into the receptor (part of the receptor is displayed as a ribbon representation, 
and the binding pocket boundary is displayed in yellow). Cyan, carbons; red, 
oxygen; blue, nitrogen; magenta, fluorine; yellow, sulfur. Hydrogens are not 
represented for clarity. 

several of them are serious candidates for cancer therapy (2,28). 
A well-characterized ligand is AGN193109, which inhibits the 
three RAR isoforms at nanomolar concentrations (29). Another 
very potent antagonist is MX781, which is effective against 
ERa-positive and -negative breast cancer cells, with no apparent 
toxicity (2). The activity of these two ligands has been presented 
in detail, but no structural information has been reported on 
their mode of interaction with the receptor. We built a model of 
RARy complexed either with AGN193109 or MX781, by using 
our flexible docking algorithm (24) (Fig. 2 A and B ). In both 
cases, the antagonist superimposed with the agonist all-trans 
RA. As observed for ERa, the antagonists also presented a 
protruding arm, which was absent in RAR agonists. Very 
importantly, this protruding arm coincided exactly with the 
single opening in the ligand binding pocket of our modeled 
receptor, generated by the displacement of helix H12 (Fig. 2 A 
and B ), and made stabilizing hydrophobic contacts with the 
protein. It is very unlikely that this perfect fit, observed for both 
antagonists, was fortuitous. On the contrary, this feature mimics 
the inactivation mechanism revealed by the crystal structure of 
ERa bound to tamoxifen and raloxifene. Therefore, our docking 
results of AGN193109 and MX781 very strongly suggest that: (i) 
the structural mechanisms of antagonist activity for ERa are 
shared by other NRs, and (ii) our model of the RAR antagonist 
binding pocket could be used to design novel antagonists. 

Screening of a Virtual Library and Discovery of Novel RAR Antagonists. 

High throughput functional screening currently is the most used 
method for the discovery of receptor-specific ligands. Although 


Antagonist 1 Antagonist 2 



Antag - + + + 

(20 \M) ^ 

Antag — — “ + ”+ “ + 

(2 MM) ^ 

Fig. 3. Functional assays of the novel antagonists. HeLa cells were trans¬ 
fected with a Gal4-hRARa-LBD expression vector and a Ga 14-CAT reporter 
gene (results were similar in studies using the three hRAR isoforms). The cells 
were incubated with 5 nM all-trans RAto stimulate CAT activity, and the effect 
of each antagonist on inhibiting CAT was examined at 2 and 20 ;uM concen¬ 
tration (the known antagonist RO-41-5253 was used as a positive control). 

efficient, it requires the physical availability and management of 
hundreds of thousands of chemical compounds. In the present 
work, we used a virtual library composed of the predicted 
structure of more than 150,000 available compounds (see Ma¬ 
terials and Methods). Each compound was automatically docked 
in a grid representation of the modeled RARa antagonist 
binding pocket. Five grid potentials carried information on the 
shape, hydrophobicity, electrostatics, and hydrogen-bonding 
availability of the receptor, and enabled a rapid docking simu¬ 
lation (24, 25). RARa was selected over the other two isoforms 
(RAR/3 and RARy) because recent data suggests it could be a 
medically more relevant target (28). After an automatic selection 
procedure with flexible ligands, and optimization of the selected 
candidates with flexible protein side chains (see Materials and 
Methods for details), 32 compounds were considered as potential 
antagonists of RARa and ordered. 

To test these compounds in vitro , HeLa cells were transfected 
with a Gal4-hRARa-LBD expression vector and a Gal4-CAT 
reporter gene (26). Studies also were performed with the three 
wild-type hRAR isoforms and a AMTV-IR-CAT reporter (26, 
27). These gave similar results as those found with Gal4- 
hRARa-LBD (data not shown). The cells were incubated with 
all-trans RA to stimulate CAT activity, and the effect of each 
antagonist candidate on inhibiting CAT stimulation by all-trans 
RA was examined. Possible toxicity of the compounds was 
deduced from the amount of cellular protein extract after 2 days 
of incubation. Two antagonist candidates inhibited CAT activity 
by 55% and 33% at 20 pM with no apparent toxicity (Fig. 3). The 
Gal4-hRARa activity illustrated in Fig. 3 was equivalent for the 
other two RAR isoforms (data not shown). No inhibition was 
observed when CAT expression was under the control of a 
Gal4-mRXRj3-LBD fusion construct, indicating that: (/) the 
antagonists are specific for RAR, and (ii) the inhibition is caused 
by an interaction with the Gal4-RAR-LBD fusion protein and 
does not result from some nonspecific effect on CAT activity 
(data not shown). 

The two RAR antagonists dock into the ligand binding pocket 
of the receptor (Figs. 2 C and D and 4). As observed for 
AGN193109 and MX781, they fit in the same binding pocket as 
the natural agonist all-trans RA, but present an additional arm, 
which protrudes out of the pocket. Antagonist 1 has a tri-f luoro 
group where the retinoid receptor ligands usually carry a car- 
boxylate group (in antagonist 2, the corresponding domain is 
truncated). In our model, antagonist 2 engages in a hydrogen 
bond with Ser-234 of the hRARa (Fig. 4 B). However, the S234A 
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Fig. 4. Novel RAR antagonists. (A and B) Stereo representation of antago¬ 
nists 1 and 2 docked into the binding site of the receptor. The ligands make 
extensive hydrophobic interactions with residues from helix 3, helix 5, and 
helix 11. Antagonist 2 ( B ) is engaged in an additional hydrogen bond with 
Ser-234 of helix 3 and contacts the remodeled C terminus (red) at Pro-405. (C 
and D) The fit of antagonists 1 and 2 intothe receptor binding pocket isshown. 


mutation in the other two isoforms does not alter the ligand 
antagonist activity, suggesting that this hydrogen bond is not 
essential for the interaction. An obvious way to increase the 
affinity of these antagonists would be to substitute the tri-f luoro 
group by a carboxylate in antagonist 1 or elongate and add a 
carboxylate to antagonist 2, which would result in more stabi¬ 
lizing interactions with two conserved arginines of the receptor. 
However, the purpose of this work is to provide evidence that the 
rational design of antagonists from the model of the inactive 
receptor is feasible and not to optimize the affinity of the 
compounds. The in vitro functional assays provide evidence that 
our modeling scheme is relevant and can be used to design novel 
antagonists of NRs. 

We applied the same strategy to discover agonists, by using the 
crystal structure of the active conformation of RARy (18), and 
could discover three novel agonists 10-25% active at 200 nM and 
fully active at 20 /xM, of 30 compounds tested (data not shown). 

Screening of a Database of Known Ligands. To assess the quality of 
our setup of the ICM screening algorithm (23), we built a small 


virtual database made up of antagonists and agonists for differ¬ 
ent members of the NR family (Table 1). We screened this 
database with our model of antagonist-bound RAR, as we did 
for the Available Chemicals Directory database. The screening 
was repeated four times, to test the reproducibility of our 
method. Table 1 shows that for each ligand the score varies a lot 
from one screening to the other. This finding reflects the 
generation of different ligand conformations from one docking 
simulation to another (data not shown) and represents the 
limitation of our method, as discussed below. 

Table 1 lists as “selected” the ligands that met with the criteria 
for preselection and final inspection during the Available Chem¬ 
icals Directory screening (i.e., score better than -37 or score 
better than -32 and at least two hydrogen bonds with the 
receptor; see Materials and Methods for details). Seven of the 
nine known RAR ligands (i.e., ^80%) and one of the six 
non-RAR ligands (i.e., **16%) were selected. The fact that RAR 
agonists, as well as antagonists, produced good scores was 
expected, because the binding pocket used for the screening is 
equivalent to the agonist binding pocket, with an additional 
opening generated by the remodeling of the C terminus of the 
receptor. The two false negatives, AGN193836 and Ro415253, 
were missed because of steric clashes, as discussed below. 
Antagonist 1 was not found either, reflecting its rather low 
affinity for the receptor. It is important to underline here that 
we do not expect to detect all of the true binders. The algorithm 
was rather designed to minimize the number of false positives, 
which correlates with the number of unnecessary in vitro 
experiments (25). 

In that respect, the presence of one false positive of six 
nonbinders could be alarming, because such a ratio would 
represent about 25,000 false positives of a database of 150,000 
compounds. However, the binding pockets of the NRs repre¬ 
sented in this database are close in size and shape; as a result, the 
database used for this benchmark was composed of molecules 
presenting strong similarities with RAR ligands. Therefore, we 
believe this ratio is not representative. The fact that we needed 
to test only 32 molecules to discover three novel RAR ligands 
confirms this assumption. 

Next, we tried to address why some ligands, such as Ro415253, 
were repeatedly missed by our screening algorithm (Ro415253 
was still not selected after 10 docking simulations, data not 
shown). We hypothesized that the ligand could not fit into the 
potential maps generated from our model and carried out a 
docking simulation with a full atom representation of the 
receptor, according to a Monte Carlo energy minimization of the 
complex, with both flexible ligand and flexible receptor side 
chains (24). This docking simulation produced a solution were 
the ligand fits into the binding pocket; the core of the ligand 
(from the carboxylate to the internal sulfone) superimposes with 
agonists such as all-trans RA, whereas the alkyl arm sticks out 
of the pocket, as previously described for the other antagonists 
(data not shown). The conformation of several receptor side 
chains was modified during the docking simulation, to accom¬ 
modate the size of the ligand, and this solution would not have 
been found with rigid side chains. This finding suggests that 
Ro415253 could not fit into the potential maps generated from 
the original receptor conformation, which we used for the 
screening. We generated a new series of potential maps from the 
optimized receptor structure and screened the small database of 
known ligands with these maps four times as above (Table 1). 
The score assigned to Ro415253 was twice lower (i.e. better) 
than the threshold. Surprisingly, this new series of potential maps 
totally eliminated the presence of both false positive and false 
negative (all RAR ligands and only RAR ligands were selected). 
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Table 1. Control screening of known NR ligands 


Ligand 

First series 

AGN193836 

ATRA 

Ro415253 

MX781 

CD2366 

Targ retin 

SR11203 

Tamoxifen 

Raloxifene 

RU486 

9cisRA 

AGN193109 

AGNpartia 

Am580 

EM652 

Antagonist 1 

Antagonist 2 

Second series 

AGN193836 

ATRA 

Ro415253 

MX781 

CD2366 

Targretin 

SR11203 

Tamoxifen 

Raloxifene 

RU486 

9cisRA 

AGN193109 

AGNpartia 

Am580 

EM652 

Antagonist 1 

Antagonist 2 


Activity 

Score 1 

Score 2 

Score 3 

RAR_agonist 

-19.9 

-9.04 

-20.6 

RAR pan-agonist 

-46.4 

-41 

-41.7 

RAR_antagonist 

-25.5 

-22. 

-28.3 

RAR antagonist 

-28. 

-23.9 

-27.1 

RAR pan-antagonist 

-28.5 

-23.3 

-30.9 

RXR pan-agonist 

-17.9 

-18.1 

-19.1 

RXR pan-agonist 

-27.5 

-27. 

-27. 

ER modulator 

-29.3 

-27.5 

-29.8 

ER modulator 

-23.4 

-20.8 

-26.7 

Progest Rec antag. 

-21.2 

-21.3 

-21.4 

RAR/RXR agonist 

-32.5 

-32.6 

-32.9 

RAR pan-antagonist 

-39.2 

-56. 

-57.4 

RAR partial agonist 

-54.4 

-54.3 

-49.5 

RAR_agonist 

-34.2 

-34.4 

-34.8 

ER antagonist 

-27. 

-27.4 

-21.7 

Novel RAR antag. 

-28.5 

-28.1 

-28.7 

Novel RAR antag. 

-27.6 

-38.9 

-40.2 

RAR_agonist 

-37.2 

-36.5 

-36.7 

RAR pan-agonist 

-51.7 

-52.6 

-51.8 

RAR_antagonist 

-28.9 

-24.4 

-39.0 

RAR antagonist 

-45.3 

-48.0 

-40.2 

RAR pan-antagonist 

-50.7 

-50.8 

-29.3 

RXR pan-agonist 

-25.4 

-23.0 

-22.2 

RXR pan-agonist 

-28.2 

-22.7 

-22.1 

ER modulator 

-26.4 

-24.6 

-30.3 

ER modulator 

-15.6 

-23.7 

-18.4 

Progest Rec antag. 

-21.4 

-20.6 

-20.3 

RAR/RXR agonist 

-38.8 

-39.5 

-33.5 

RAR pan-antagonist 

-55.1 

-55.5 

-41.2 

RAR partial agonist 

-61.4 

-61.3 

-61.4 

RAR_agonist 

-46.6 

-47.2 

-46.6 

ER antagonist 

-26.3 

-23.1 

-23.7 

Novel RAR antag. 

-32.1 

-32.1 

-31.7 

Novel RAR antag. 

-33.3 

-29.7 

-33.8 


Score 4 

Selected 

Binding 

References 

-19.7 


+ 

(33) 

-41. 

+ 

+ 

(34) 

-28.6 

- 

+ 

(28) 

-36.4 

+ 

+ 

(2) 

-32.3 

+ 

+ 

(34) 

-18.6 

- 

- 

(4) 

-27.2 

- 

- 

(34) 

-28.3 

- 

- 

(23) 

-34.6 

+ 

- 

(22) 

-21.3 

- 

- 

(25) 

-16.9 

+ 

+ 

(34) 

-39.4 

+ 

+ 

(29) 

-29.1 

+ 

+ 

(29) 

-34.5 

+ 

+ 

(34) 

-28.8 

- 

- 

(35) 

-28.8 

_ 

+ 

(35) 

-26.3 

+ 

+ 

(35) 

-35.3 

+ 

+ 

(33) 

-52.0 

+ 

+ 

(34) 

-46.6 

+ 

+ 

(28) 

-45.6 

+ 

+ 

(2) 

-29.3 

+ 

+ 

(34) 

-31.0 

- 

- 

(4) 

-27.5 

- 

- 

(34) 

-23.4 

- 

- 

(23) 

-17.4 

- 

- 

(22) 

-20.1 

- 

- 

(25) 

-38.7 

+ 

+ 

(34) 

-54.8 

+ 

+ 

(29) 

-61.0 

+ 

+ 

(29) 

-46.5 

+ 

+ 

(34) 

-27.3 

- 

- 

(35) 

-31.6 

+ 

+ 

(35) 

-33.8 

+ 

+ 

(35) 


First series; A similar screening as the one performed on the ACD database was carried out four times on a small database made of known RAR antagonists, 
agonists, as well as ligands for other NRs and the two novel RAR antagonists. The ligands that met at least once with the criteria for selection used during the 
ACD screening are listed as Selected. The ligands that are experimentally binding to RAR are listed as Binding. Second series: Screening of known ligands after 
adjustment of the receptor's binding pocket conformation. The RAR antagonist Ro415253 was docked into our model of antagonist-bound RAR with flexible 
receptor side chains and ligand. The resulting receptor conformation was used for a novel screening. 


Discussion 

In this study, we presented a strategy for the discovery of 
antagonists, as well as agonists, for NRs, which are very impor¬ 
tant targets for drug design. An important aspect of our ap¬ 
proach was to exclude any preconceived pharmacophore bias 
from our database screening. Most drug design strategies impose 
chemical constraints on the selected molecule to conserve the 
functional groups believed to be most important in existing 
ligands, preventing the discovery of novel ligand types. In the 
present work, we avoided pharmacophore constraints thanks to 
a robust flexible docking program and scoring function: the only 
filters used for screening were a good fit with the receptor and 
reasonable bioavailability parameters (30). As a result, we 
discovered novel original ligands that could be further optimized 
into potent RAR-selective antagonists and agonists. 

A limitation of our method, which leaves room for further 
improvement, is that a compromise must be made between the 
time allocated for each ligand (less than 2 min on one processor 
here) and the reliability of the sampling of the conformational 
space. Indeed, Table 1 shows that four runs for each ligand are 
necessary to minimize efficiently missed hits (the remaining 


missed positives were not selected because of inappropriate 
receptor side-chain conformations and not because of an insuf¬ 
ficient sampling). Improvement of the computing power, the 
docking algorithm, and the scoring function all could result in a 
more robust virtual database screening. 

Another drawback is that the conformation of the receptor is 
not necessarily unique, but can vary from one ligand to another. 
As a result, a ligand that fits in receptor conformation A will 
never be found if receptor conformation B is used for the 
screening. The case of Ro415253 illustrates this issue well: this 
known antagonist was never selected, even after 10 trials, 
because the binding pocket used for the screening was too 
narrow. The potential maps used for the screening have a 
smoother van der Waals profile than the atomic representation 
of the receptor; as a result, the maps are more tolerant regarding 
steric clashes with the ligand. However, the degree of tolerance 
is limited and cannot accommodate important conformational 
changes of the receptor side chains (or backbone, obviously). 
When new potential maps generated from a model of RAR 
bound to Ro415253 were used for screening, the three RAR 
ligands missing from the first screening were selected (Table 1). 
This finding confirms that the initial conformation of the 


1012 | www.pnas.org 


Schapira et al. 


receptor prevented the selection of, or reduced the chances of 
selecting, some known RAR ligands. The false positive ralox¬ 
ifene (Table 1) was making extensive van der Waals interactions 
with the narrow RAR binding pocket, which compensated for 
the lack of stabilizing electrostatic interactions. However, in the 
new conformation of the receptor (Table 1), the binding pocket 
is wider and the fit not as tight. As a result, raloxifene was not 
selected. This observation emphasizes, if necessary, that virtual 
screening is very sensitive to the conformation of the receptor. 

In that respect, it is interesting to note that the topology of the 
remodeled C-tcrminal loop is probably not unique, and that the 
conformation used to generate the receptor potential maps was 
one among many others. It is therefore legitimate to wonder 
whether novel antagonists could not be discovered as efficiently 
from a structure of the receptor where the C terminus, instead 
of being remodeled, was truncated. This brings up a fundamental 
question: is the role of antagonists only to antagonize the “closed 
lid” conformation where helix HI2 sits on top of the ligand 
binding pocket, or are they also stabilizing the inactive confor¬ 
mation of the receptor? It is important to keep it mind that the 
C-terminal tail of RAR (as well as for other NRs) is a very 
dynamic entity when no ligand is bound to the receptor and 
probably oscillates between active and inactive conformations. 
Once bound in the ligand binding pocket, agonists contact the 
H12 helix and lock the receptor in its coactivator-binding 
conformation. Likewise, it is reasonable to speculate that an¬ 
tagonists would contact the C-terminal tail of the receptor and 
stabilize the inactive state. However, it is probable that the 
conformation of the receptor varies from one ligand to another; 
indeed, recent results on ER a show that different ligands induce 
distinct conformational change of the receptor (31). We used the 
crystal structure of ER a bound to tamoxifen to build our model 
of inactive RAR and could find two specific antagonists, one of 
which contacts the remodeled tail of the receptor. Although the 
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conformation we used for the C-terminal tail was probably not 
the only possible one, we believe that its presence was important 
to bias the screening toward compounds that actually do contact 
the flexible arm of RAR, as well as to impose a reasonable 
boundary on the antagonist binding pocket, and prevent the 
ligands from drifting out of the pocket during the docking 
simulations. 

An important point was to demonstrate that we could discover 
novel antagonists for a NR other than ERa, provided that the 
structure of the agonist-bound active form of the protein was 
known. Rational design of ligands from a model of a receptor is 
thought by many to yield very low success rates. The present 
study demonstrates that this strategy can be successfully under¬ 
taken with appropriate biological systems and robust modeling 
tools. Moreover, targeting models of diverse members of the NR 
family could be further justified by the wealth of structural and 
sequence information (9, 13), as well as the finding that NR 
family members share similar mechanisms of transcriptional 
activation and inhibition (9). 

The recent publication of the crystal structures of medically 
relevant receptor targets, such as peroxisome pro 1 if era tor-activated 
receptor y (21), RAR (18), RXR (32), ERa (11), or progesterone 
receptor (15), has created an exciting opportunity for the discovery 
of novel ligands. This study demonstrates that the rational design of 
both antagonists and agonists, by using computer-generated models 
based on these structures, is possible. 
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Introduction 


Formation of non-covalent complexes is an essential part of almost any 
biological process. Remarkable complexity of the biochemical machinery of the 
living organisms would have been impossible without the ability of the 
participating molecules to recognize each other among thousands of other 
compounds simultaneously present in any cell. Specific binding between 
molecules is crucial in catalysis, signal transduction, molecular transport 
mechanisms, and determines the pharmacological effect of many drugs. 

Better knowledge of the nature of molecular recognition on the microscopic 
level is important for our understanding of the normal and pathological processes 
in the cell and may help in such practical applications as drug design. X-ray 
crystallography has revealed detailed atomic descriptions of many individual 
proteins, nucleic acids and small biological molecules, as well as a number of 
structures of complexes. The Protein Data Bank (PDB) (Bernstein et al. 1977), 
where solved protein 3D structures are deposited is growing by about 1000 new 
structures a year. Available structures of complexes can be analyzed to discover 
the basic interactions and principles of molecular recognition, while the individual 
structures can be used in the prediction of unknown or novel complexes. First 
attempts to predict molecular interactions and design novel ligand utilized hand¬ 
made physical models of receptor sites and ligands (Beddell et al. 1976). Since the 
manipulation of the systems containing hundreds or thousands of atoms is 
necessary to simulate the binding process, the progress in numerical computational 
approaches was essential for the advancement of the macromolecular association 
studies. Computer simulations of molecular recognition were first attempted more 
then twenty years ago (Kuntz et al. 1982). Considerable progress has been 
achieved in the recent years, but reliability and precision of the existing complex 
prediction methods is still far from ideal. 



Molecular docking simulations 


Prediction of the structure of a complex starting from the structures of individual 
molecules is commonly called molecular docking problem. Structures of the 
protein-ligand and especially protein-protein complexes often show remarkable 
shape complementarity on the interface, suggesting the idea that the docking 
algorithms should search for such matching surfaces. Early approaches such as the 
original DOCK (Kuntz et al. 1982) used exclusively this geometric criterion. Both 
components of the complex were assumed rigid and the docking procedure 
searched for favorable mutual orientation using “sphere matching” (DesJarlais et 
al. 1986), least-squares fitting of the surface patterns (Bacon and Moult 1992, 
Leach and Kuntz 1992), Fourier-transform (Katchalski-Katzir et al. 1992), 
distance-matrix matching (Helmer-Citterich and Tramontano 1994) or “geometric 
hashing” (Fischer et al. 1995). Purely geometric approaches demonstrated certain 
success in recombining the structures of protein-protein complexes when the 
components were taken from the native complexed structure, which is somewhat 
artificial starting point. In the more realistic cases where the individual structures 
of the constituents were used, these techniques often failed to distinguish the 
correct orientation (Bacon and Moult 1992). High complementarity of the 
interacting surfaces in the native complexes is in part due to the “induced fit”, e.g. 
the conformational change in the constituents of the complex upon binding, while 
individual structures often do not show the perfect matching expected in the 
complex. There are two general directions in which the simplistic geometric 
docking algorithms are being improved. First is the introduction of flexibility of 
ligand and/or receptor to reproduce or mimic the induced fit, and the second is the 
inclusion of binding determinants other than pure surface complementarity. First 
attempts to introduce flexibility in protein-protein docking were limited to 
“sCitcaiiig” of the geometric criteria which would allow certain degiee of 
penetration between the two interacting surfaces (Jiang and Kim 1991, Walls and 




Sternberg 1992). Direct simulations with all-atom models may account for the 
flexibility more accurately and sometimes show promising results (Nilges, M. & 
Brunger 1993, DiNola et al. 1994, Abagyan et al. 1994), but are often extremely 
computationally expensive. Whichever way the flexibility is introduced, it results 
in much greater ambiguity of the results of geometric docking, since many 
approximate matches can be found. The multiplicity of solutions calls for 
additional criteria to select the correct answer. This lead to the inclusion in the 
docking protocol of the other binding determinants such as estimates of solvation 
free energy change or molecular mechanics energy (Shoichet and Kuntz 1991), 
and ultimately, the approximations of the free energy change upon binding (Bohm 
1994a,b). Most methods however still use simplistic but faster measures during the 
generation of the bound conformations and than reevaluate the putative solutions 
using more sophisticated potentials. 

Docking as an energy optimization problem 

Complexes considered in the docking studies are in general thermodynamically 
stable systems. Thus, the native bound conformation should represent the global 
minimum of the free energy. Consequently, to find the docked conformation, the 
global minimum of the free energy function of the system has to be located. Since 
the precise evaluation of the free energy is difficult, one can try to use some 
approximation that would have similar global minimum. From the energetic point 
of view, surface complementarity docking methods assume that the interaction 
energy is proportional to the contact area or other similar measure of the fit of two 
surfaces, possibly with some penalty for bad contacts (clashes). While this 
assumption may account reasonably well for van der Waals interactions and, to 
some extent, for solvation, it obviously disregards the energy contributions from 
specific pan wise atomic interactions such as hydrogen bond formation and 
electrostatics. Many recent docking studies try to incorporate these terms, often as 



the additional criteria to select the answer from several solutions generated by 
geometric docking, either using force-field energy evaluation (Shoichet and Kuntz 
1991) or elaborate scoring functions (Bohm 1994b, Jain 1996). In several works, 
physical energy terms were used throughout the algorithm (Abagyan et al. 1994, 
Totrov and Abagyan 1994). 

Two major components are required for a successful prediction of the structure 
of the protein-ligand complex: an efficient global optimization procedure which is 
capable of finding a global minimum for the strongly anisotropic function of 
dozens of variables and a free energy approximation for the complex in solution 
which is computationally inexpensive to be used in the search procedure, yet 
sufficiently accurate to ensure the uniqueness of the native conformation. In the 
following two parts we will review the energy calculations and global 
optimization methods. 

Energy terms 

Energy calculations are at the center of almost any molecular simulation 
technique. It is convenient and customary to divide the energy of the molecular 
system into a number of components, or energy terms. Below, five major terms of 
the molecular interaction energy will be considered in greater detail. 

Electrostatic Interactions 

Electromagnetism is the fundamental force of biochemistry (Davis and 
McCammon 1990). All processes on the molecular level can be described in terms 
of electromagnetic interaction combined with quantum mechanical and 
thermodynamic effects. While covalent and hydrogen bonding as well as Van der 
Waals interaction all have electrostatic nature, these interactions are complicated 
by quantum mechanics and it is often convenient to separate them from the longer- 




range electrostatic interactions. It is the latter type of interactions which is 
customarily referred to as electrostatics in biomolecular structure. All proteins and 
large majority of ligands contain polar atoms interacting strongly with each other 
and the solvent in the wide range of distances. For a charged amino-acid the 
strength of electrostatic forces may exceed by more than an order of magnitude the 
strength of van der Waals interaction (Warshel and Russell 1984). 

The evaluation of electrostatic interactions in proteins was first attempted by 
Lingstrom-Lang in 1924 and a theory of electrostatics in macro molecules was 
proposed (Tanford and Kirkwood 1957). These macroscopic studies gave some 
qualitative insights, but only the availability of high-resolution protein structures 
and computer calculations allowed quantitative studies of protein electrostatics. 

The largest problem in electrostatic calculations is the presence of highly polar 
solvent (water). In vacuum or in the uniform media the interaction between two 
charges can be simply described by Coulomb law 

E = k$& 2 - 
£^12 

where q l<2 are the charges, R 12 is the distance between them, e the dielectric 
constant and k is 332.0 when the charges are electron units, distance is expressed 
in angstroms and energy in kcal/mol. In aqueous environment this relation has to 
be corrected to include the interaction of the charges under consideration with 
large (virtually infinite) number of surrounding water molecules. Early attempts to 
simulate macromolecules without consideration of solvent screening ran into 
difficulties, for example DNA double-helix would be tom apart by electrostatic 
forces unless the electric charges were drastically reduced (Harvey 1989). 

The straightforward and rigorous approach is to include explicitly sufficiently 
thick layer of water molecules into the calculations. Obviously, it makes 
calculations heavier, but the principal difficulty of the explicit methods is that 
liquid water is essentially dynamic environment. Any static placement of water 
molecules around the system under consideration would result in large errors, as 



the physically observed interaction with water is the result of averaging over a 
large thermodynamic ensemble of the possible states of the solvent. Thus, to 
achieve accurate results one has to generate this ensemble by an extensive 
molecular dynamics simulation (Rastelli et al. 1995, Simmerling and Elber 1995). 
While this might be the most rigorous approach to the solvation electrostatic 
calculations, in most cases it is impractical. Langevine dipoles were proposed 
(Rossky et al. 1978, Luzhkov and Warshel 1992) to make implicit averaging over 
water molecule’s orientations, which eliminates the necessity of generation of a 
large ensemble of water configurations. The method was applied in protein-protein 
docking and gave promising results (Jackson et al. 1998). 

The solvent effectively screens the interaction of the charges of the solute. 
Generally the farther from each other and the more exposed to the solvent charges 
are the more their interaction is attenuated. This observation suggested simple 
corrections to the Coulomb law such as distance-dependant dielectric constant and 
charge-scaling. While it is somewhat ad hoc and doesn’t take into account the 
interaction of the individual charges with the solvent (self-energy), distance- 
dependant dielectric constant 8 =b 0 R is widely used because of it’s simplicity 
(McCammon et al. 1979, Pickersgill 1988). This expression actually accelerates 
calculations of the energy and forces because they become dependent only on R 2 
instead of R, eliminating costly square root calculations. Charge scaling was 
shown to improve the simulation results for such systems as DNA. While these 
crude approaches can hardly be used for quantitative evaluation of the properties 
of a macromolecule in solution, they keep the extra calculations to a minimum. 
Alternatively, the solvent can be considered as a continuous medium of high 
dielectric constant. This treatment of solvent is more computationally tractable 
than the inclusion of explicit water molecules. The electric potential in the 
medium of variable dielectric constant obeys the Poisson differential equation 

-V(e(r)V<|)(r))=p(r) 



where e is the dielectric constant (permittivity), (|> is the electric potential, and p is 
the charge density. If s(r)=const, the Poison equation is equivalent to the Coulomb 
law, but the solution becomes more complicated when the space is divided into the 
regions of various dielectric permittivity. Analytic results exist only for special 
cases such as a sphere. Certain methods utilize these analytic solutions to obtain 
relatively simple approximations of energy under an assumption that the protein 
has near-spherical shape, e.g. image method (Friedman 1975, Schaefer and 
Froemmel 1990, Abagyan and Totrov 1994). Similar assumptions are used in 
generalized Bom approximation (Still et al. 1990, Cramer and Truhlar 1992). The 
precision of these methods is rather limited. Much more rigorous approach is to 
solve the Poisson equation numerically. Several techniques based on this idea 
were developed and are widely used in the protein energy calculations (Zauhar and 
Morgan 1985, Juffer et al. 1991, Nicholls and Honig 1991, Zauhar and Vamek 
1996). Main difficulty in their application to docking is high computational cost. A 
hybrid method was recently proposed and used in docking simulation, utilizing 
single numerical solution of Poisson equation for unbound receptor supplemented 
by generalized Bom-type terms calculated for each specific bound ligand 
conformation (Majeux et al. 1999). 

Hydrophobicity 

Transfer to the aqueous solution of a number of organic groups results in a free 
energy loss related to the ordering of water molecules around such groups which is 
known as hydrophobic effect. The concept of hydrophobic interaction was 
introduced by Kauzmann (Kauzmann 1959). This effect is similar in nature to the 
macroscopic surface tension. Hydrophobic interaction is a major driving force in 
the formation of most ligand-receptor complexes. For some ligands such as 
steroids the interaction is almost exclusively hydrophobic, and many other ligands 
are amphiphylic with hydrophobic groups binding into hydrophobic pockets of the 



receptor. By fitting the transfer free energies of hydrocarbons against the solvent 
accessible surface, the hydrophobic contribution was shown (Chothia 1976) to be 
proportional to the solvent accessible surface with fairly good precision. However, 
the coefficient of this proportionality is a subject to some controversy since it 
differs sharply from the microscopically observed value of the surface tension 
constant. Microscopic surface tension value derived from the transfer energies of 
aliphatic compounds is close to 30 cal/A 2 while macroscopic hydrocarbon-water 
surface tension constant is ~75 cal/A 2 . Some attempts were made to explain the 
discrepancy by taking into consideration the curvature dependence of the surface 
tension and the difference of the molar volume of solute and solvent (Sharp et al. 
1991). 

It remains to be seen if the division of the water-solute interaction into solvation 
electrostatics and hydrophobic components is the most adequate approach. 
Methods based on this partitioning were shown to reproduce successfully 
experimental data on transfer free energies for a large set of compounds (Sitkoff 
at al. 1994). However, alternative approaches to water-solute interaction 
evaluation were also developed, particularly a number of atomic solvation 
parameter (ASP) based methods (Eisenberg and McLachlan 1986, Wesson and 
Eisenberg 1992). ASP methods differentiate the atoms of the solute into a number 
of types, each with a particular value of solvation energy surface density, 
generalizing the surface tension. The underlying assumption is that the water- 
solute interaction can be partitioned into atomic contributions, which are 
proportional to the solvent accessible surface areas of the atoms. Popularity of 
ASP approach is in part due to the simplicity and computational efficiency, while 
the drawbacks are that neither proportionality of the solvation energy to the 
accessible surface nor the partitioning of the solvation energy into atomic 
contributions cannot be rigorously justified and are largely ad hoc assumptions. 
Nevertheless, good agreement with experimental data can be achieved (Horton 
and Lewis 1992), which might in part be explained by the large number of 



adjustable parameters in the ASP models. It is questionable that these methods can 
perform well on a set of compounds which is much larger then the set used for the 
parameter adjustment. 

Van der Waals interactions 

The most generic type of interatomic force which exhibits itself as a very strong 
repulsion at short distances and turns into relatively weak and quickly decreasing 
attraction as the distance between two atoms grows. It is commonly described by 
6-12 potential: 

Eyw )- rr + ~jTvL 

where R,j is the distance between the two atoms i and j. Parameters Ay and By 
depend on the types of atoms and are usually calculated using combination rules 
from the parameters for the identical pairs of atoms, which are in turn evaluated 
from quantum-mechanical or experimental data. Usually these parameters are 
derived along with the other components of the atomic interaction energy to form 
so-called molecular mechanics force-fields, such as CHARMM (Brooks et al. 
1983), AMBER (Weiner et al. 1984), MMFF (Halgren 1995) and ECEPP 
(Momany et al. 1975). While the 1/R 6 form of the attraction term has strict 
quantum-mechanical basis, rigorous description of the repulsion term is more 
complicated. Alternative forms of the repulsion term have been proposed (e.g. 
Halgren 1995). Fortunately, the interactions in biomolecular systems occur mostly 
in the range of inter-atomic distances where attractive term is prevalent, and seem 
to avoid strong repulsion, alleviating the problem of the exact description of the 
repulsive term. 

Still, extreme sensitivity of the Van der Waals interactions to the small 
conformational changes makes it's inclusion in the calculation of binding energy 
problematic. This led a number of authors to simply omit the Van der Waals 



contribution in the binding energy, as it seems to introduce more noise than signal 
into the energy estimates (Krystek et al. 1993, Vajda et al. 1994). Such omission is 
partly justified by the cancellation of ligand-receptor interactions in the bound 
state and the ligand-solvent/receptor-solvent interactions in the unbound state. One 
can assume that overall number of inter-atomic contacts in the system remains 
nearly constant upon binding, resulting in the conservation of the total Van der 
Waals interaction energy. Geometric docking can be used to achieve reasonably 
good packing. However, this approach leaves out entirely the dependence of the 
interaction energy on the quality of the interface. In the case of the docking of 
novel ligands the evaluation of the interface is essential for determination of the 
binding likelihood and the correct binding mode. Possible compromise is to 
modify the Van der Waals potential so that it becomes less sensitive to the small 
deviations in atomic coordinates. 

Hydrogen Bonds 

Hydrogen bond interaction is a specific attraction between polar hydrogens and a 
number of heavy atoms (primarily oxygen, nitrogen, sulfur) which have unshared 
electron pairs. The observations of large number of complexes with solved 3D 
structures show that many ligands form extensive networks of hydrogen bonds 
with their receptors, especially in cases of high specificity and high affinity 
binding. Hydrogen bonds also play important role in the protein folding, where 
their formation between the turns of the a-helixes and between the (3-strands 
stabilizes these essential secondary structure elements. Unfortunately, there seems 
to be no agreement so far about the adequate functional form for the hydrogen 
bonding interaction term and even the energetic value of an average hydrogen 
bond. Since its origin lays in the same electrostatic and quantum interactions as the 
origin of Van der Waais and eiecuostauc terms, hydrogen bonding is often 
included in the force field as a modification to the Van der Waals potential for the 



specific atom pairs (Nemethy et al. 1992, Halgren 1995). The modification may 
only involve change in the parameters (MMFF), or a different functional form (10- 
12 instead of the standard 6-12 Van der Waals potential in ECEPP). Some force 
fields simply ignore hydrogen bonding in hope that electrostatic term will provide 
sufficient favorable contribution when positive hydrogen atoms and negative 
hydrogen bond acceptors are brought together. However, the charge distribution 
around the acceptor atoms is highly anisotropic since the unshared electron pairs 
occupy sp v orbitals, resulting in strong anisotropy of the HB interaction. High 
directionality of the HB interaction can also be observed in the solved structures of 
the proteins and protein complexes (Ippolito et al. 1990). This anisotropy is 
largely ignored by pair-wise, atom-centric potentials used by the majority of the 
force fields. Such omission may not lead to large errors as long as only naturally 
occurring conformations are considered since they often already have optimal or 
sub-optimal configuration of hydrogen bonds. However, in the course of a 
simulation, such as docking, it may result in erroneous formation of hydrogen 
bonds of physically impossible geometries. Several forms of hydrogen-bonding 
term with explicit angular dependence were proposed (Goodford 1985, Miller 
1994). 

Conformational Entropy 

Binding of the ligand to the receptor usually imposes strong constraints upon it’s 
conformational freedom. Also, the surface side-chains of the receptor which are in 
contact with the ligand may no longer access some of their rotameric states. There 
is a loss in translational and rotational degrees of freedom, which does not depend 
on the participating molecules and can be seen as constant as long as only 1-to-l 
stoicheometry complexes are considered. Thus, binding may result in substantial 
decrease in the entropy. As an illustration, one can consider the burial of one CH 2 
group in an aliphatic chain. The loss of three rotameric states of the chain results 


in the enthropy loss which adds /?71n3 = 0.66 kcal/mole to the free energy of the 
system, while the decrease in hydrophobic term is around -0.88 kcal/mole (Yang 
et al. 1992). 

Exact determination of the entropy change would require extensive molecular 
dynamics simulations. Currently such simulations are too expensive 
computationally to use them routinely for a large number of putative complexed 
structures. Docking methods generally assume that upon binding the ligand is 
locked in a single conformation. While in some cases this assumption might be far 
from true, it allows exclusion of the conformational enthropy term from docking 
simulations as a constant. 

Conformational search techniques 

An efficient global optimization procedure is a key component of the docking 
protocol. Many approaches treat both ligand and receptor as rigid bodies (Kuntz et 
al. 1982, Cherfils et al. 1991, Bacon and Moult 1992). Such treatment allows for 
rapid location of the optimal mutual orientation of the two molecules by special 
techniques (DOCK), but has limited applicability since the majority of small 
ligands are flexible and structural rearrangements occur in a number of receptors. 
To some extent, the limitations of the rigid-body docking can be circumvented if 
several low-energy conformations of the ligand are generated and then docked. 
The best solution can be than picked as an answer (Kearsley et al. 1994, Leach 
1994). However, the number of conformations which have to be docked 
independently to achieve an accurate solution may become very large even for 
relatively small compounds. Therefore, many techniques try to treat the flexibility 
of the ligand more directly. Flexible ligand often can be partitioned into rigid 
fragments. For each fragment, rigid docking can produce a number of favorable 
orientations. Fragments are then reassembled into the original chemical structure 
(“Hammerhead” in Welch et al. 1996, Gulukota et al. 1996). Alternatively, one 




fragment is assumed to be essential for binding and placed in the active site first, 
then others are attached incrementally (Bohm 1992, Rarey et al 1996). 

Global optimization of the free energy function with respect to the orientation 
and the conformation of the ligand is, perhaps, the most strict approach. However, 
two features of the protein-ligand energy landscape complicate the problem of the 
energy optimization: high dimensionality and multiplicity of local minima. High 
dimensionality makes the exhaustive search of the conformational space very 
computationally expensive. Large number of local minima makes rational 
determination of the global search direction virtually impossible and limits the 
usability of the derivatives to a small vicinity of one local minimum. In order to 
deal with these difficulties, techniques such as Monte-Carlo minimization 
(Caflisch et al. 1992, Totrov and Abagyan 1997, Trosset and Scheraga 1998), 
Monte-Carlo with simulated annealing (Goodsell and Olson 1990, Goodsell et al. 
1996), and genetic algorithm (Jones et al. 1997) have been applied with various 
success. Some of these methods use internal coordinates to reduce the 
dimensionality of the search space. 

Monte-Carlo 

The term Monte-Carlo has been introduced by Metropolis and Ulam (Metropolis 
et al. 1953), with an allusion to the essentially random nature of such simulations. 
Monte-Carlo minimization consists of three repetitive steps: 

1. Random Jump. One or several variables in the system are changed randomly. 

2. Local Minimization. The energy of randomized conformation is optimized 
using conjugate gradient or quasi-Newton technique to achieve a new local 
minimum. 

3. Evaluation. New conformation is accepted or rejected according to the 
Metropolis criterion. If the energy of ihe new conformation E new is lower than the 



energy of the old one E old , new conformation is always accepted and used in the 
next iteration. Otherwise, it is accepted with the probability of 

P acc =exp(-(E new -Eoid)/kT), where k is Boltzman’s constant and T is the effective 
temperature of the simulation. 

It has been established that a full local minimization after each random step 
greatly improves the efficiency of the procedure (Li and Scheraga 1987, Abagyan 
and Argos 1992). However, some components of the energy, such as solvation 
electrostatic energy, might have no derivatives and/or might be too 
computationally expensive for local minimization. Double-energy MC 
minimization scheme (Abagyan et al. 1994) circumvents this obstacle by using 
two sets of energy terms, one for the local gradient minimization stage and another 
one for the Metropolis criterion evaluation stage in the MC step. Such division can 
be justified if the extra terms included for the Metropolis criterion are relatively 
“slow”, insensitive to small conformational changes. 

Internal coordinates 

One of the principal difficulties in biomolecular simulations is the size of the 
system which often contains thousands of atoms. As a consequence, the 
conformational space has a very high dimensionality, complicating the search for 
the global energy minimum. The use of internal coordinates substantially reduces 
the number of variables defining the conformation of the system. Cartesian 
description requires 3 variables (x,y,z) per atom . Internal coordinates description 
uses bond lengths, planar angles and torsion angles instead. Since bond lengths 
and planar angles are essentially rigid at normal conditions, one can consider them 
as constants and only allow torsion angle changes (rotations around the bonds), 
reducing the dimensionality of conformational space at least threefold. Practically 
even greater reduction is achieved since at cve^y branching point several atoms 
share the same torsion angle (Fig 1). 



The formal geometrical description to allow efficient manipulations of the multi- 
molecular system in internal coordinates with arbitrary subsets of free and fixed 
variables was introduced (Abagyan et al. 1994). The technique represents the 
system as a directed treelike graph imposed on all atoms as well as on some 
auxiliary virtual atoms (Fig 1). Each atom in this basic description has three 
geometric parameters determining its position with respect to the preceding part of 
the tree. The parameters are bond length b, bond angle co and torsion (p or phase (j) 
dihedral angles for the main branch and side branches, respectively. The sub-trees 
of different molecules join in the starting triple of virtual atoms which are fixed at 
the origin of the coordinate system and allow for standard treatment of all real 
atoms including the root atoms of each molecular sub-tree. When several internal 
variables are fixed (considered constant) a group of atoms may form so-called 
rigid-body, where mutual positions of the atoms involved do not change upon any 
changes of the remaining free variables. The concept of rigid bodies provides an 
important additional advantage for the energy calculations, since all pair-wise 
energy contributions from the atoms within a rigid body are constant. Such 
contributions often can be excluded from the calculations when only the relative 
energy change is important, improving the computational performance. 

Other approaches 

Various global optimization techniques were applied to the docking problem. 
Among the more popular is the genetic algorithm (GA), which was widely applied 
in protein folding simulations (Clearwater 1991, Unger and Moult 1993, Dandekar 
and Argos 1994). The idea of GA is to mimic the evolution process by 
manipulating “chromosomes”, each containing a set of variable values defining a 
possible solution, e.g. a certain binding mode. The values inside the 
“chromosome” might be the rotatable torsion angles of the ligand and the variables 
defining the relative orientation of the ligand and receptor. The algorithm starts 




with a random “population” of chromosomes, from which new generations are 
produced by “mutations” and “crossovers”, which involve, respectively, 
randomization of some variables inside the chromosome or reshuffling of some 
variable values between two chromosomes. The best-fit “individuals” are 
preserved while others are discarded according to the fitness function. The 
assumption is that as the algorithm progresses, this strategy will find and keep the 
advantageous combinations of variable values, converging to the minimum of the 
fitness function. The GA docking was used fairly successfully to reconstitute a 
large number of known complexes (Jones et al. 1997), although no tests were 
undertaken to compare its performance with more conventional approaches such 
as MC. 

Notably, Fourier-transform was also used to locate the optimal geometric fit 
(Katchalski-Katzir et al. 1992). The method is efficient and attractively simple 
conceptually. Unfortunately it seems to be only be applicable to a rather simplistic 
fitness function and can only optimize efficiently the three translational degrees of 
freedom. Rotations still need to be sampled by other means, i.e. systematic or 
random search. Fourier-transform approach may be useful primarily in the cases 
where the interacting molecules are very big, making other methods too expensive 
computationally. 

Molecular dynamics (MD) simulation can be used as an optimization method, and 
potentially it can provide a realistic picture of the binding process. However, MD 
is the most computationally expensive approach, and so far it is impossible to 
simulate the whole progress of the system from unbound components to the 
complex. The use of MD in docking is now limited to the simulations of the 
already bound complexes, where it is successfully used to predict various 
thermodynamic properties (Rosenfeld et al. 1995, Miranker and Karplus 1991, DiNola 
et al. 1994, Luty et al. 1995). Somewhat better performance can be achieved using 
so-called Brownian dynamics (Rossky et al. 1978), which was applied to simulate 



long-range diffusion-like motions of the interacting macromolecules (Kozack and 
Subramaniam 1993). 

An example docking study on a set of protein-ligand complexes 

with known 3D structures. 

We tested the ability of internal coordinate MC minimization docking procedure 
to predict the native conformations of protein-ligand complexes using a 
benchmark set of 51 high-resolution structures from PDB. Ligands were diverse in 
size, from 12 to 84 atoms, and had a broad range of chemical properties and 
included sugars, fatty acids, phosphates, bases, heterocyclic and other compounds, 
which insured the applicability of the docking procedure to a large variety of 
receptor/ligand pairs. 

Methods 

Energy 

Our energy estimate used during the docking simulations consisted of the 
following terms: 

E = EpFint+ Evw +Ehb+Ehp+Eel 

AEpFint is the force-field energy which included internal Van der Waals 
interactions and torsion energy for the ligand calculated with ECEPP/3 parameters 
(Nemethy et al. 1992). Since ECEPP/3 only has parameters for amino-acid atom 
types, the atoms of ligands were assigned closest chemically similar atom types. 
The rest of the terms refer to inter-molecular interactions. 

Because of its extreme rigidity, Van der Waals potential in its standard 6-12 
form may introduce large noise in the energy function. For inter-molecular 
interactions we therefore used a modified smoother form of the potential with 




most of the repulsive part truncated. Truncation was achieved by the following 
transformation of the original value of Van der Waals potential: 
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This expression ensures smooth transition from undistorted form of Van der 
Waals potential in the negative range of values to increasingly attenuated form in 
the positive range, asymptotically approaching E max cutoff value. E max was chosen 
on the basis of preliminary tests to be 1.5 kcal/mole. Lower values sometimes 
result in severely clashed docking solutions as the Van der Waals repulsion is no 
longer able to compete with attractive terms, primarily electrostatics. This and 
other interaction potentials were precalculated on a grid to accelerate energy 
evaluation during the simulations. The grid cell size was set to 0.5A. 

AEhb is hydrogen bonding term which was calculated using Gaussian-type 
potential positioned around the center of each lone electron pair of the hydrogen- 
bond acceptors: 

( r~r lp y 

P — /T° p d HB 

t-'HB ~ 

The peak interaction energy E°hb was assumed to be 2.5 kcal/mol as an average of 
various estimates, and the radius of the interaction sphere d HB was assumed to be 
1.4A, allowing for about 30° to 40° deviation from the ideal geometry in 
accordance with observations in X-ray structures. r hb is the radius-vector of the 
interaction center, which was placed 1.7A from the atom. In case of hydrogen 
atoms the center was placed along the axis of the covalent bond attaching the 
hydrogen to the rest of the molecule. In case of heavy sp 2 atoms, one ( for 
nitrogen) or two ( for oxygen ) centers were placed at the angle of 120° to the 
existing covalent bond. For sp J oxygen and sulfur, two centers were placed in 
tetrahedral geometry, at 109° to the existing covalent bonds and to each other. 



Electrostatic term E el used modified Coulomb law with distance dependant 
dielectric constant e=4r. Hydrophobic term Ehp was calculated as roughly 
proportional to the buried hydrophobic surface with the free energy density of 30 
cal/mol/A 2 . To accelerate calculations, a grid-based form of the hydrophobic 
potential was developed. The fragments of the solvent-accessible surface were 
generated using the modified Shrake and Rupley algorithm (Shrake and Rupley 
1973, Abagyan et al. 1994). The algorithm produces dots which evenly cover the 
surface. The hydrophobic potential on the grid was then calculated as: 

J7 — J7° P d w 

^ HP ~ u HP e 

d surf is the distance to the closest point of the hydrophobic surface, and d w is 
effective radius of the hydrophobic interaction which was set to the diameter of 
the water molecule 2.8 A. The value of E°hp=3 kcal/mole was chosen to 
approximate the surface tension of 30 cal/mol/A 2 for extended hydrophobic 
surfaces in test cases. 

Conformational search procedure 

All ligand /receptor pairs in the set were docked using flexible Monte-Carlo 
docking procedure with potential maps as implemented in ICM software (Totrov 
and Abagyan 1997, Abagyan et al. 1994, Abagyan and Totrov 1994). The ICM 
method describes both the relative positions of two molecules and their 
conformations by a uniform set of internal variables. Any subset of internal 
variables can be subjected to local or global energy minimization procedures. In 
this study, the global Monte-Carlo minimization procedure similar to previously 
described (Totrov and Abagyan 1997, Abagyan et al. 1994) was used. It involved 
random conformational change of two possible types: positional Pseudo-Brownian 
random move or internal torsion modification, followed by local energy 
minimization (up to 100 steps of conjugate gradient minimization) and selection 
by the Metropolis criterion (temperature factor was set to 600K). Pseudo- 



Brownian random moves changed the position of the ligand molecule as a whole 
with a certain amplitude (here we used 2A), as well as randomly rotated it around 
it’s center of gravity by an angle close to the translation amplitude over the radius 
of gyration. Internal torsion angles of the ligand were randomly changed one at a 
time, with the amplitude of 180°. 

Geometrically different (as evaluated by the root mean square displacement of 
the ligand atoms) and low energy conformations were accumulated in the 
conformational stack (Abagyan and Argos 1992). Adaptive length of the MC runs 
was used, with the limit on the total number of steps proportional to the size 
(number of atoms) of the ligand: N M csteps=50*N L igAtom- Similarly, an adaptive 
length of local minimization during the MC run was used: 
NLocMmSteps=25+N L jgAtom- The factors in these relations were established 
empirically from the convergence and efficiency considerations. 

Test data set 

The set of 51 complexes (Table 1.) was extracted from high-resolution PDB 
structures. The structures were selected according to a number of criteria: We 
discarded all structures at resolutions worse than 2.0A since large errors in the 
receptor coordinates could result in poor docking and recognition for reasons 
unrelated to our study. Some complexes had the ligand bound covalently to the 
receptor and were also discarded since the prediction of such chemical reactions is 
beyond the scope of our approach. We also omitted complexes where metal ions 
were directly involved in the protein-ligand interaction since the force field used in 
the simulations did not provide for adequate modeling of such atoms. For a 
number of receptors structures of several complexes with different ligands were 
available. In such cases we used a single receptor structure in docking experiments 
with nil ligands. Hydrogen -toms were added to all X-ray structures using the 
hydrogen placement algorithm of ICM software (Abagyan et al. 1994). Electric 



charges were assigned to the atoms of the ligands using bond-charge increment 
algorithm from MMFF94 force field (Halgren 1995). 

Results and discussion 

51 complexes with known structures were predicted. Only the best-energy 
conformation in each case was retained and compared to the experimental 
structure. 35 predictions were within 3A from the native structure, producing 
correct overall positioning of the ligand, and 26 were within 2A, giving fairly 
detailed picture of the receptor-ligand interaction (Table 1, Fig.2). As expected, 
good precision is achieved for tighter, enclosed binding pockets, while for more 
loose, open binding sites such as in phospholipase, FK506 binding protein or fatty 
acid binding protein the prediction quality is often marginal. Single simulation 
took from 2 to 12 min CPU time, which illustrates the advantage of pre-calculated 
grid potentials, since similar simulations with full-atom receptor molecule take 
several hours (Totrov and Abagyan 1997). 

The results show that docking techniques, such as flexible docking in internal 
coordinates using grid potential representation of the receptor molecule, in the 
majority of cases can produce a model of protein-ligand interaction with the 
precision allowing its use in applications such as drug design. However, an 
important condition for the current docking methods is the relative rigidity of the 
binding site. Reliable ways to treat receptor flexibility are yet to be developed. The 
growth in the available computer power and improvement in simulation 
techniques should ultimately allow detailed predictions of flexible receptor and 
ligand interaction. 
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Table 1 Ligands/receptor pairs used in docking simulations 


Sourc 

e 

(PDB 

code) 

Receptor 

Ligand 

RMSD 

A 

lhrnr 1 

Fatty acid binding 
protein 

elaidic acid 

4.89 

lhms 

Fatty acid binding 
protein 

oleic acid 

6.69 

lhmt 

licm 1 

Fatty acid binding 
protein 

stearic acid 

3.49 

Intestinal fatty acid 
binding protein 

Myristate 

1.79 

lien 

4dfr 1 

Intestinal fatty acid 
binding protein 

oleic acid 

1.46 

DHFR 

Methotrexate 

1.81 

idyj 

DHFR 

5,10-dideazatetrahydrofolate 

2.84 

ldyh 

DHFR 

5-deazafolate 

2.48 

ldyi 

DHFR 

Folate 

3.45 

ljom 

DHFR 

folinic acid 

5.00 

Itbs 1 

Trypsin 

Benzamidine 

1.91 

ltng 

Trypsin 

Aminomethylcyclohexane 

1.27 

ltnh 

Trypsin 

4-fluorobenzylamine 

1.94 

ltni 

Trypsin 

4-phenylbutylamine 

3.05 

ltnj 

Trypsin 

2-phenylethylamine 

2.79 

ltnk 

Trypsin 

3-phenylpropylamine 

2.60 

ltnl 

Trypsin 

Tranylcypromine 

2.11 

1881 1 

Lysozyme mutant 

o-xylene 

0.42 

1851 

Lysozyme mutant 

Indole 

1.33 

1841 

Lysozyme mutant 

Isobutylbenzene 

4.53 

1871 

Lysozyme mutant 

p-xylene 

1.75 

1861 

Lysozyme mutant 

n-butylbenzene 

1.62 

1811 

Lysozyme mutant 

Benzene 

0.80 

1831 

Lysozyme mutant 

Indene 

0.52 

1821 

Lysozyme mutant 

Benzofuran 

0.43 

lerb 1 

Retinol binding protein 

n-ethyl retinamide 

0.99 

lfel 

Retinol binding protein 

Fenretinide 

2.21 

lfem 

Retinol binding protein 

Retinoic acid 

2.35 

Ifen 

lsre 1 

Retinol binding protein 

Axerophthene 

0.93 

Streptavidin 

Haba 

1.50 





lsrg 

Streptavidin 

3 -methyl-haba 

7.24 

lsri 

Streptavidin 

3 ’,5 -dimethyl-haba 

7.53 

lsrj 

Streptavidin 

Naphthyl-haba 

0.76 

lgar 

Glycinamide 

ribonucleotide 

transformylase 

Burroughs-Wellcome inhibitor 
1476u89 

2.19 

lfnd 

Ferredoxin reductase 

FAD 

8.87 2 

lake 

Adenylate kinase 

Inhibitor ap5a 

0.98 

lrcf 

Flavodoxin 

flavin mononucleotide 

1.2 

lmrj 

a-momorcharin 

Adenine 

3.51 

lmrg 

a-trichosanthin 

Adenine 

0.42 

lmdq 

Maltodextrin-binding 

protein 

Maltose 

0.92 

lgca 

Glucose/galactose¬ 
binding protein 

Galactose 

1.27 

2dri 

D-ribose-binding protein 

beta-d-ribose 

0.56 

list 

Lysine-, arginine-, 
ornithine-binding protein 

Lysine 

0.61 

lhsl 

Histidine-binding protein 

Histidine 

1.68 

lars 

Aspartate 

aminotransferase 

Pyridoxal-5 -phosphate 

2.37 

lfkh 

FK506 binding protei 

(1 r)-1 -cyclohexyl-3-phenyl-1 - 
propyl (2s)-l-(3,3-dimethyl- 1,2- 
dioxopentyl)-2- 
piperidinecarboxylate 

2.27 

lmai 

Phospholipase c 8-7 

Inositol trisphosphate 

5.03 

lnsc 1 

Neuraminidase 

sialic acid 

0.93 

lnsd 

Neuraminidase 

2,3-dehydro-2-deoxy-n-acetyl 
neuraminic acid 

0.75 

livd 

Neuraminidase 

4-(acetylamino)-3 -hydroxy-5 - 
nitrobenzoic acid 

0.83 

lfgi 

FGF receptor kinase 

Inhibitor SU5402 

0.76 


1. Structure of the receptor from this PDB entry was used in all docking simulations 
involving the same protein. 

2. Flavine nucleotide moiety was docked well, while the other half (adenosine) deviates 
significantly from its native position. 
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Fig. 1(a) Four types of. internal variables considered in ICM. (b) The ICM tree 
representing the geometry of multimolecular arbitrarily fixed system and 
containing both real atoms and bonds (continuous lines) and virtual ones (dot- 
dashed lines). Atoms are numbered so that any atom in the directed graph starts a 
subtree with a continuous numeration. An arbitrary subset of free internal 
variables is shown in bold black characters, all the others being fixed (gray 
characters). The atomic regular directed graph is the basic one, the orderof 
variables and rigid bodies following it. The numeration does not change as a result 
of refixation and redefinition of the rigid bodies. The attribution of the main 
(torsion) branch at the branching point is arbitrary and does not necessarily follow 
the atomic numeration. 



Fig. 2(a,b) Comparison of predictions and experimentally determined structures 
for 53 protein-ligand complexes. 
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Abstract 

We have tested several programs for flexible molecular docking: 
DOCK 4.0, FlexX 1.8, AutoDock 3.0 and ICM 2.8. This was done by 
doing two kinds of studies: docking experiments on a data set of 37 
protein-ligand complexes and screening a library containing 10,037 
entries against 11 different proteins. The docking accuracy of the 
methods was judged based on the corresponding rank 1 solutions. We 
found that on average the solutions produced by ICM have the highest 
accuracy. All other methods are less accurate and their predictions are 
of approximately the same accuracy. The efficiency of docking, which 
measure a computational cost of achieved accuracy, is very low in case 
of AutoDock and approximately the same for all other methods. The 
database screening was performed using DOCK, FlexX and ICM. In 
19 cases ICM is able to find original ligands within the top 1 % of the 
total library. The corresponding number for DOCK and FlexX is 8 
and 11 respectively. 



1 Introduction 


Molecular docking has become a useful tool in drug discovery efforts. The 
screening of large databases for possible lead compounds is nowdays becom¬ 
ing a routine procedure. Until recently the molecular docking was performed 
using a single conformation for each ligand. This approach, which treats lig¬ 
and as a rigid body, is CPU benign and thus satisfies a key requirement that 
docking algorithm should not spend more than few minutes per compound 
while searching the database. However, the constraints introduced by fixing 
the internal degrees of freedom of the ligand, although advantageous in terms 
of computational cost, could have a negative impact on the ability to make 
a valuable prediction. In particular, the freezing of internal motion may pre¬ 
vent ligand from adopting a conformation that would fit better into a binding 
site of a receptor. Thus efforts has been advanced to develop algorithms for a 
flexible ligand docking. Based on the approach to the conformational search 
of flexible ligand, we can group the algorithms in the following two major 
classes: algorithms, which try to fit the ligand into the binding pocket of 
the protein by matching (geometrically, chemically, energetically etc.); and 
algorithms, which find an optimum ligand conformation by solving a global 
energy optimization problem. In this study we attempt to evaluate the per¬ 
formance of several programs employing algorithms of both classes. The first 
group of algorithms is represented by programs DOCK and FlexX, and the 
second group is represented by programs AutoDock and ICM. All these pro¬ 
grams are publically available and were obtained by us under either academic 
(DOCK 4.0, AutoDock 3.0, and ICM 2.8) or demo (FlexX 1.8) licenses. In 
order to compare the performance of selected programs, we undertook two 
kinds of studies: cross-docking experiments involving protein-ligand com¬ 
plexes with different ligands but the same protein and database screening 
against the same proteins. We have chosen to perform the cross—docking ex¬ 
periments because they have the same underlying principle as the database 
screening experiments: in both cases different ligands are being fitted into 
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the same receptor. Alternatively, we could have used an uncomplexed pro¬ 
teins for docking experiments. However, this approach is problematic since 
the crystal structure of apoproteins is rarely available and in addition one 
would need to employ flexible receptor description. The dataset for docking 
experiments included 11 groups of complexes, each group containing between 
two and eight members. The library for screening experiments contains lig¬ 
ands from docking dataset and additional 10,000 molecules randomly selected 
from a database of commercially available compounds (ACD) distributed by 
MDL. 

The outline of the paper is as follows. First we briefly review the algo¬ 
rithms employed by the programs. Next we describe input data preparation 
and docking protocols. Finally we present results on docking and screening 
experiments. 

2 Algorithms 

Here we briefly outline the docking algorithms employed by AutoDock, DOCK, 
FlexX, and ICM. The reader is referred to the original papers for a detailed 
account. 

AutoDock 

AutoDock explores the conformational space of the ligand using the Lamarkian 
genetic algorithm (LGA), which is a hybrid of a genetic algorithm (GA) 
with an adaptive local search (LS) method. 1 In this approach the ligand s 
state is represented as a chromosome, which is composed of a string of real¬ 
valued genes, describing ligand’s location (three coordinates), orientation 
(four quaternions) and conformation (one value for each torsion). The simu¬ 
lation is started by creating a random population of individuals. It is followed 
by a specified number of generation cycles, each consisting of the following 
steps: mapping and fitness evaluation, selection, crossover, mutation and 
elitist selection. Each generation cycle is followed by a local search. The 
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solutions are scored using the energy scoring finction, which include terms 
accounting for short-ranged van der Waals and electrostatic interactions, loss 
of entropy upon ligand binding, hydrogen bonding and solvation. 

DOCK and FlexX 

Both DOCK 2 and FlexX 3,4 employ the incremental reconstruction algo¬ 
rithm. In this algorithm rigid anchor (DOCK) or base (FlexX) fragments 
are identified first. At the next step the selected fragment is placed into 
the active site of the receptor using sphere-matching procedure (DOCK) or 
hashing technique (FlexX). The complete ligand is constructed by adding 
the remaining components step by step. At each step of reconstruction a 
specified number of optimal partial solutions are selected for the next ex¬ 
tension step. In DOCK the solutions are scored using energy, contact or 
chemical scoring functions. The energy scoring function, which was used in 
this study, includes van der Waals and electrostatic components. In FlexX 
the scoring is done using a modified Bohm scoring function, which includes 
the following terms: entropic, which accounts for loss of entropy upon lig¬ 
and binding; hydrogen bonding; ionic, acounting for electrostatic interac¬ 
tions; aromatic, which accounts for interactions between aromatic groups; 
and lipophilic, which accounts for hydrophobic interactions. All terms, ex¬ 
cept the entropic, are scaled by the corresponding heuristic distance and 
angle dependent penalizing functions. 

ICM 

ICM performs flexible docking via Monte Carlo global optimization of 
the effective energy function in the internal coordinate space of the flexible 
ligand and flexible receptor. 5-7 The effective energy function includes the 
following terms: intramolecular van der Waals and torsion energy; modified 
intermolecular van der Waals interaction energy; hydrogen bonding term; 
term accounting for hydrophobic interactions; and electrostatic term using 
modified Coulomb law with distance dependent dielectric constant e = 4r. 
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The Monte Carlo procedure employed by ICM includes two types of moves: 
a pseudo-brownian positional move and a biased-probability multitorsion 
move. Each move is followed by full local energy minimization. Pseudo- 
Brownian random move changes the ligand position by moving the entire 
molecule with a certain amplitude and rotating it randomly around its center 
of mass by a certain angle. Biased-probability multitorsion move can be used 
to implement the flexible description of the receptor side-chains, however 
this capability was not invoked in the present study. Global optimization is 
performed starting from multiple starting points. The solutions are scored 
using the effective energy function. 

3 METHODS 

Input data preparation and algorithms comparison methodology 

(1) The proteins which we have chosen for docking and database screening 
experiments, satisfy the following criterea: they have at least two entries with 
different ligands in the protein databank (PDB); they do not form covalent 
bonds with their respective ligands; majority of their ligands have relatively 
large number of rotatable bonds (see Table 1). 

(2) The ligand input files were prepared according to the following proce¬ 
dure. First, we extracted ligand coordinates from the the corresponding PDB 
file and assigned chemical bonds, partial charges and added hydrogen atoms 
using ICM. All carboxylic acid and phosphoric acid groups were ionized and 
all amino-groups were protonated. Next, all torsion bonds were randomized 
and local minimization was performed. After that the ligand coordinates 
were modified in such a way that its center of geometry was superimposed 
with that of the reference ligand. Finally, the ligand coordinates were written 
into MOL2 and PDB format files. 

(3) The receptor input files were prepared according to the following 
precedure. First, we removed from the corresponding PDB file all water 
molecules, ligand atoms and those ions, which did not belong to the active 
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site of the receptor. Next hydrogen atoms were added and partial charges 
were assigned using ICM. This was followed by a local minimization. Finally, 
the receptor coordinates were written into MOL2 and PDB format files. 

(4) The docking experiments were performed on the same computer and 
CPU time required for docking was recorded. The docking protocols are 
described in the next section. We emphasize here, that all methods use the 
same receptor coordinates and start with exactly the same initial location, 
conformation and orientation of the ligands. The length of the docking ex¬ 
periments was controlled by the default or recommended parameter settings. 
We observed that doubling the length of the docking experiments does not 
improve significantly the accuracy of the solutions. 

(5) For each docking method only the best scoring solution per com¬ 
plex was saved. Different algorithms were compared based on the root-mean 
square deviation (RMSD) of heavy atoms of the best predicted structures 
from the corresponding crystal structures. If the ligand has local topological 
symmetry at single bonds, whose torsion angle can be changed by a rotation 
of less than 360° without changes in the global conformation of the ligand, 
the RMSD of alternative orientations was calculated and the smallest one 
was kept for the purpose of comparison of different algorithms. The coor¬ 
dinates of ligand structure, used for RMSD calculations, were obtained by 
superimposing its the crystallographic coordinates protein coordinates with 
the receptor coordinates used for the docking. 

(6) In order to quantify ligand docking quality and compare performance 
of different methods, we introduced the docking acciracy (DA) function, 
which makes use of RMSD values and measures how accurately the ligands 
- members of a particular group, - are docked by a given method: 


DA — frmsd <2 H“ 0.5 {frmsd <3 frmsd< 2 ) j (l) 

where f rm sd<a indicates the fraction of ligands docked into a given receptor 
with RMSD less or equal a A. The docking accuracy of the method for a 
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particular receptor is zero if f rm sd< 3 is zero. 

Docking protocols 

In all algorithms studied here, the receptor is treated as a rigid body and 
a grid potential is used to evaluate the scoring functions. This simplifica¬ 
tion allows to perform docking more efficiently, which is especially crucial in 
database screening. 

AutoDock 

AutoDock requires the receptor and ligand coordinates in MOL2 format. 
Nonpolar hydrogen atoms were removed from the receptor file and their 
partial charges were added to the corresponding carbon atoms. The program 
Mol2topdbqs was used to transform receptor MOL2 file into PDBQS format 
file containing the receptor atoms coordinates, partial charges and solvation 
parameters. The program AutoTors was used to transform ligand MOL2 file 
into PDBQ file, merge nonpolar hydrogen atoms and define torsions. The 
grid calculations were setup with utility Mkgpf3 and maps were calculated 
with program Auto Grid. The grid maps were centered on the ligand’s binding 
site and they were 61 x 61 x 61 points. The default parameters setting, 
generated by program MkdpfS, was used for docking. For each complex 10 
dockings were performed. The initial population was set to 50 individuals; 
ma ximum number of energy evaluations was 2.5 x 10 5 ; maximum number 
of generations was 27,000. The other parameters provided by the default 
setting were the same as in Ref. [1]. 

DOCK 

DOCK requires the following receptor files: MOL2 format file containing 
coordinates of all atoms; PDB file containing heavy atoms coordinates only; 
and PDB file containing heavy atoms excluding those of the active site. The 
active site atoms included those receptor atoms which were within 6.5 A from 
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the reference ligand atoms. The ligand coordinates were provided in MOL2 
format. The site points for the ligand docking were identified using SPHGEN 
program. The number of docking points did not exceed 50. Energy score 
was employed for orientational and conformational search. Grid maps were 
calculated using Grid program, with grid spacing of 0.5 A. The energy cutoff 
distance of 10 A was employed. Electrostatic interaction were calculated 
with distance depending dielectric constant. The dielectric factor was set 
to 4. Proteins were represented by a united atom model. Flexible bonds 
and anchors were automatically identified by the DOCK. Conformational 
search was done with torsion drive. The clash overlap set to 0.5. Top 25 
conformations were retained during each cycle of the search. Multiple anchors 
were allowed, with minimum number of heavy atoms in the anchor set to 10. 
Orientational search was performed with an automated matching. Maximum 
number of orientations was 500 for docking experiments and 100 for database 
search. The local energy minimization of orientations and conformations of 
ligand and anchor was performed. The ligand reminimization was turned on. 
The default minimization parameters were employed. 

FlexX 

FlexX requires MOL2 format file for the ligand and PDB format file 
for the receptor. The default settings as provided with FlexX 1.8 package 
were used for flexible docking and database screening. The conformational 
flexibility of the ligand is modeled by a discrete set of preferred torsional 
angles for acyclic single bonds. The rings were considered rigid, since the 
program CORINA for treating multiple conformations of the rings was not 
included in the distribution. The active site and the interaction surface 
of the receptor were defined by placing a reference ligand and using 6.5 A 
cutoff distance. Base fragments were selected automatically. The maximum 
number of base fragments was 4. The base fragment was placed into the 
active site by using two algorithms. The first one superimposes triples of 
interaction centers of a base fragment with triples of compatible interactions 
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in the active site. The second algorithm, called matching, is used when 
the base fragment had fewer than three interaction centers. The maximum 
number of solutions retained for the next iteration step was 400. 

ICM 

Grid maps were calculated with a grid spacing of 0.5 A. Docking was 
performed with a default script provided by ICM. During the docking, either 
one of the torsional angles of the ligand was randomly changed or pseudo- 
Brownian move was performed. Each random change was followed by 100 of 
local conjugate-gradient minimizaion. The new conformation was accepted 
or rejected according to Methropolis rule using temperature of 600 K. The 
length (number of Monte Carlo steps) of docking run as well as the length 
of local minimiztion length was determined automatically by the adaptive 
algorithm, depending on the size and number of flexible torsions in the ligand. 

4 RESULTS 

All docking experiments and database screening were performed on a SGI 
10000 equipped with a single 195 MHz IP28 processor and 128 MB main 
memory. 

Docking experiments 

The following receptors were used for docking experiments: trypsin (PDB 
entry 3ptb), cytochrome P-450 ca m (PDB entry lphf), neuraminidase (PDB 
entry lnsc), carboxypeptidase A (PDB entry lcbx), L-arabinose binding pro¬ 
tein (PDB entry labe), e-thrombin (PDB entry letr), thermolysin (PDB 
entry 3tmn), pencillopepsin (PDB entry lapt), intestinal fat-acid binding 
protein (PDB entry licm), ribonuclease Ti (PDB entry lgsp), and carbonic 
anhydrase II (PDB entry lcil). Most receptors have three ligand members 
with with exception of trypsin, which has 8 members and penicillopepsin, 
which has 2 members. 
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The complexes for which cross-docking experiments were performed and 
the docking results, such as RMSD values and CPU times, are given in 
Table 1. The docking accuracies of studied programs are summarized in 
Table 2. We see that ICM is the most accurate in predicting correct protein- 
ligand conformations. It gets perfect score of one for 5 receptors, which means 
that all members of those receptors are docked within RMSD less than 2 A 
from the corresponding crystal structures. Other methods are much less 
accurate in their predictions. In particular, we observe that there are several 
receptors for wich they fail to produce any acceptable solution at all. Those 
are e-thrombin, thermolysin and carbonic anhydrase II in case of AutoDock; 
neuraminidase, carboxipeptidase and pensillopepsin in case of DOCK; and 
thermolysine, pensillopepsin and carbonic anhydrase II in case of FlexX. On 
average the docking accuracy of AutoDock and FlexX is approximately the 
same and that of DOCK is slightly worse. 

As evident from Table 2 the average docking time increases in the follow¬ 
ing order: FlexX, DOCK, ICM and AutoDock. The low docking speed of 
AutoDock suggests that at the present time it is not suitable for database 
screening on a single processor computer. We note, however, that compared 
to overall cost in the drug development process the computational cost is 
of lesser importance. Moreover, it is deemed to reduce owing to rapid de¬ 
velopments in computer industy. Thus when comparing different docking 
algorithms more emphasis should be attached to their accuracy rather than 
the computational cost. With this in mind we conclude based on the results 
of docking experiments that the Internal Coordinates Method has the best 
docking performance among studied algorithms. 

Before proceeding to the library screening results, we note that in all 
docking experiments we used receptor and ligand MOL2 files generated by 
ICM. However, according to AutoDock, DOCK and FlexX manuals, it is 
recommended to use SYBYL to generate the ligand and receptor (AutoDock 
and DOCK) input files. The only difference between input files generated 
by SYBYL and ICM is in partial atomic charges. ICM uses bond charge 
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increment method from MMFF94 (Halgren) to assign partial atomic charges. 
In order to be certain that our results are not influenced by partial charges 
provided by ICM, we repeated docking experiments for AutoDock, DOCK 
and FlexX using charges generated by SYBYL: Kollman charges for receptor 
atoms and Gasteiger charges for ligand atoms. Our findings indicate that 
the accuracy of AutoDock, DOCK and FlexX with SYBYL charges was the 
same as with ICM charges. 

Database screening 

The same proteins that were used for docking experiments have been 
chosen to perform screening of a ligand library. As was mentioned in intro¬ 
duction, this library contains ligands that were used for docking experiments 
and additional 10,000 molecules selected randomly from ACD library. The 
purpose of the screening experiments was to find out how well the programs 
would distinguish the original ligands of the complexes among all database 
molecules. This has a tremendous practical implications, since good dock¬ 
ing algorithm allows to cut significantly the cost of drug discovery process 
by reducing the fraction of compounds of the ligand library that should be 
analyzed experimentally as potential drug candidates. In order to quantify 
the database screening we use the following virtual screening (VS) function: 


VS = /<i + 0.5 (/<5 — /<i), (2) 

where f< a indicates the fraction of original ligands found within top a % of 
scanned solutions. The virtual screening function varies between 0 and 1. 
The value VS for a particular receptor molecule is 1 if all of its ligands are 
found within top 1 % of scanned solutions and 0 if no original ligands are 
found within top 5 % of scanned solutions. The results of screening performed 
by DOCK, FlexX and ICM are summarized in Table 3 and the VS values 
are given in table 4. First we note that out of 37 original ligands ICM 
places 19 ligands within top 1 % scanned solutions, while the corresponding 
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number for DOCK and FlexX is 8 and 11 respectively. The easiest receptor 
for screening experiments is L-Arabinose, since all algorithms assign very 
high scores (VS equals 1) to its original ligands. There are also receptors 
for which algorithms fail to place original ligands even within 5 % of top 
scoring solutions. There are 4 such receptors in case of DOCK, 3 receptors 
in case of FlexX and 2 receptors in case of ICM. The average value of the 
virtual screening function is 0.30, 0.32 and 0.6 for DOCK, FlexX and ICM 
respectively. This result suggest, that it is sufficient to consider top 5 % of 
best scoring solutions produced ICM as potential drug candidates, while in 
case of DOCK and FlexX more than 5 % of top scorers should be taken for 
experimental verification. 

As a sidenote, we found that the database screening results with DOCK 
are improved somewhat if the chemical scoring function is employed instead 
of energy score. This contrasts the results of docking experiments, where no 
influence of the the choice of scoring function was found. 

5 CONCLUSIONS 

Our results indicate that ICM outperforms other docking programs in both 
docking and database screening experiments. 


12 



References 

[1] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. 
Belew, and A. J. Olson J. Comp. Chem., 19, 1639, (1998). 

[2] S. Makino and I. D. Kuntz, J. Comp. Chem., 18, 1812, (1997). 

[3] M. Rarey, B. Kramer, T. Lengauer, and G. Klebe, J. Mol. Biol., 261, 
470, (1996). 

[4] B. Kramer, M. Rarey, and T. Lengauer, Proteins, 37, 228, (1999). 

[5] R. Abagyan, M. Totrov, and D. Kuznetsov, J. Comp. Chem., 15, 488, 
(1994). 

[6] M. Totrov, and R. Abagyan, Proteins, Suppl. 1, 215, (1997). 

[7] M. Totrov, and R. Abagyan, in The Thermodynamics of Protein-Ligand 
Interactions] ???., Eds.; Wiley and Sons: New York, 2000 (in press). 


13 


Table 1 Results of the Docking Experiments 


Ligand 

Nrot a 

AutoDock 6 

DOCK 6 

FlexX 5 

ICM 6 

Trypsin 

3ptb 

3 

0.80 (323) 

0.59 (23) 

1.11 (15) 

0.49 (77) 

ltng 

2 

0.62 (270) 

0.86 (20) 

1.08 (5) 

0.71 (79) 

ltnj 

3 

1.21 (290) 

1.56 (22) 

1.73 (35) 

2.17 (21) 

ltnk 

4 

1.69 (330) 

1.87 (29) 

1.70 (11) 

2.53 (26) 

ltni 

5 

2.61 (367) 

5.26 (35) 

2.73 (12) 

3.40 (39) 

ltnl 

1 

0.41 (300) 

2.08 (25) 

3.74 (30) 

1.93 (17) 

ltpp 

7 

1.80 (330) 

3.25 (46) 

1.95 (47) 

1.71 (53) 

Ipph 

11 

5.14 (920) 

3.91 (212) 

3.27 (51) 

1.44 (207) 

Cytochrome P-450 ca m 
lphf 

1 

2.09 (254) 

2.39 (25) 

4.68 (72) 

1.23 (28) 

lphg 

5 

3.52 (390) 

5.57 (39) 

4.87 (169) 

0.46 (57) 

2cpp 

3 

3.40 (230) 

2.48 (22) 

0.44 (6) 

2.53 (34) 

Neuraminidase 

lnsc 

12 

1.40 (610) 

4.86 (97) 

6.00 (57) 

1.80 (119) 

lnsd 

11 

1.20 (600) 

4.51 (110) 

1.56 (88) 

1.04 (70) 

lnnb 

11 

0.92 (650) 

4.51 (88) 

0.92 (71) 

1.09 (108) 

Carboxypeptidase A 
lcbx 

5 

1.33 (354) 

3.13 (45) 

1.32 (24) 

0.82 (40) 

3cpa 

8 

2.26 (512) 

6.48 (56) 

1.51 (172) 

0.77 (51) 

6cpa 

16 

8.30 (1007) 

8.30 (163) 

9.83 (81) 

1.60 (350) 

L-Arabinose binding protein 
labe 

4 

0.16 (340) 

1.87 (32) 

0.55 (30) 

0.36 (38) 

labf 

5 

0.48 (320) 

3.25 (36) 

0.76 (35) 

0.61 (37) 

5abp 

6 

0.48 (400) 

3.89 (43) 

4.68 (29) 

0.88 (42) 

e-Thrombin 

letr 

15 

4.61 (1153) 

6.66 (371) 

7.26 (104) 

0.87 (444) 

lets 

13 

5.06 (1366) 

3.93 (522) 

2.11 (69) 

6.22 (344) 

lett 

11 

8.12 (1003) 

1.33 (371) 

6.24 (72) 

0.99 (219) 
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Continuation of Table 1. 


Thermolysin 

3tmn 

10 

4.51 (630) 

7.09 (107) 

5.30 (67) 

1.36 (99) 

5tln 

14 

5.34 (711) 

1.39 (140) 

6.33 (62) 

1.42 (196) 

6tmn 

20 

8.72 (1027) 

7.78 (262) 

4.51 (67) 

2.60 (420) 

Penicillopepsin 

lapt 

30 

1.89 (1242) 

8.06 (416) 

5.95 (76) 

0.88 (700) 

lapu 

29 

9.10 (1002) 

7.58 (409) 

8.43 (78) 

2.02 (590) 

Intestinal FABP 
licm 

13 

1.80 (583) 

3.99 (112) 

2.94 (31) 

1.11 (154) 

lien 

17 

3.99 (583) 

3.88 (166) 

2.95 (42) 

1.35 (314) 

2ifb 

15 

3.09 (513) 

1.43 (135) 

8.94 (14) 

1.04 (234) 

Ribonuclease Ti 
lgsp 

4 

2.67 (592) 

1.16 (59) 

3.71 (44) 

0.54 (59) 

lrhl 

7 

0.96(710) 

0.71 (72) 

1.15 (51) 

3.53 (78) 

Iris 

7 

0.98 (703) 

1.75 (76) 

4.33 (72) 

0.79 (77) 

Carbonic Anhydrase II 
lcil 

6 

5.81 (460) 

2.78 (63) 

3.52 (87) 

2.00 (58) 

lokl 

5 

8.54 (396) 

5.65 (38) 

4.22 (105) 

3.03 (42) 

lenx 

13 

10.9 (700) 

7.35 (63) 

6.83 (72) 

2.09 (176) 


a) Number of rotatable bonds in the ligand, 
b) First number is RMSD values in A; the second number (in parentheses) 
is docking.time in seconds. 
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Table 2 Docking accuracies and average docking times (seconds) of algorithms 


Receptor 

AutoDock 

DOCK 

FlexX 

ICM 

Trypsine 

Cytochrome P450 ca m 

Neuraminidase 

Carboxypeptidase 

L-Arabinose 

e-Trombin 

Thermolysin 

Pencillopepsin 

Intestinal Fat-Acid 
Ribonuclease Tj 
Carbonic Anhydrase II 

0.81 (391.2) 
0.17 (291.3) 
0.67 (620) 
0.50 (624.3) 
1.00 (353.3) 

0 (1174) 

0 (789.3) 
0.50 (1122) 
0.33 (559.7) 
0.83 (668.3) 

0 (518.7) 

0.57 (51.5) 
0.33 (28.7) 

0 (98.3) 

0 (88) 

0.33 (37) 
0.33 (421.3) 
0.33 (169.7) 

0 (412.5) 
0.33 (137.7) 
1.00 (69) 
0.17 (54.7) 

0.73 (25.7) 
0.33 (82.3) 
0.67 (72) 
0.67 (92.3) 
0.67 (31.3) 
0.17 (81.7) 

0 (65.3) 

0 (77) 
0.33 (29) 
0.33 (55.7) 

0 (88) 

0.75 (64.9) 
0.83 (39.7) 
1.00 (99) 
1.00 (147) 
1.00 (39) 
0.67 (335.7) 
0.83 (238.3) 
1.00 (645) 
1.00 (234) 
0.67 (71.3) 
0.67 (92) 

Average 

0.44 (646.5) 

0.31 (142.6) 

0.39 (63.7) 

0.93 (182.3) 
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Table 3 Results of the Virtual Database Screening 


Ligand 

DOCK 0 

FlexX° 

ICM° 

Trypsin 



9928 

15 (0.1) 

3ptb 

2014 

(20.3) 

1391 (14) 

ltng 

2855 

(28.7) 

3056 (31) 

27 (0.3) 

ltnj 

4478 

(45.1) 

4757 (48) 

106 (1.1) 

ltnk 

5771 

(58.1) 

3685 (37) 

1463 (14.6) 

ltni 

4330 

(43.6) 

3821 (38) 

60 (0.6) 

ltnl 

8138 

(81.9) 

3880 (39) 

5181 (51.6) 

ltpp 

393 

(4.0) 

27 (0.3) 

45 (0.4) 

Ipph 

37 

(0.4) 

63 (0.6) 

32 (0.3) 

Cytochrome P-450 cam 



9928 

1010 (10.1) 

lphf 

2676 

(26.9) 

1309 (13.2) 

lphg 

6022 

(60.6) 

141 (1.4) 

1280 (12.7) 

2cpp 

724 

(7.3) 

1250 (12.6) 

1171 (11.7) 

Neuraminidase 



8418 


lnsc 

1893 

(19.0) 

611 (7.2) 

503 (5.01) 

lnsd 

933 

(9.4) 

252 (3.0) 

44 (0.44) 

lnnb 

664 

(6.7) 

116 (1.4) 

43 (0.43) 

Carboxypeptidase A 



8951 

23 (0.2) 

lcbx 

3656 

(36.8) 

240 (2.7) 

3cpa 

1411 

(14.2) 

122 (1.4) 

32 (0.3) 

6cpa 

842 

(8.5) 

246 (2.7) 

360 (3.4) 

L-Arabinose binding protein 



7593 

1 (0.01) 

labe 

52 

(0.5) 

19 (0.2) 

labf 

119 

(1.2) 

15 (0.2) 

3 (0.03) 

5abp 

21 

(0.2) 

29 (0.4) 

2 (0.02) 

e-Thrombin 



5579 

157 (1.6) 

letr 

556 

(5.6) 

23 (0.4) 

lets 

38 

(0.4) 

13 (0.2) 

672 (6.7) 

lett 

90 

(0.9) 

485 (8.7) 

6 (0.06) 




Continuation of Table 3. 


Thermolysin 


8353 


3tmn 

3412 (34.3) 

289 (3.5) 


5tln 

6554 (66.0) 

53 (0.6) 


6tmn 

1494 (15.0) 

707 (8.5) 


Penicillopepsin 


8778 

110 (1.1) 

lapt 

28 (0.3) 

3637 (41) 

lapu 

377 (3.8) 

7036 (80) 

1286 (12.8) 

Intestinal FABP 


5903 


licm 

1008 (10.1) 

3562 (70.8) 

266 (2.6) 

lien 

335 (3.4) 

3856 (75.1) 

48 (0.5) 

2ifb 

704 (7.1) 

5097 (99.3) 

49 (0.5) 

Ribonuclease Ti 


9870 


lgsp 

442 (4.4) 

1060 (10.7) 

59 (0.6) 

lrhl 

553 (5.6) 

1009 (10.2) 

45 (0.4) 

Iris 

6849 (68.9) 

735 (7.4) 

9090 (90.6) 

Carbonic Anhydrase II 


9857 

7193 (71.7) 

lcil 

3476 (35.0) 

364 (3.7) 

lokl 

1841 (18.5) 

1907 (19.3) 

3105 (30.9) 

lenx 

29 (0.3) 

1429 (14.5) 

9090 (90.6) 


a) First number is a rank of the original ligand of the receptor; second 
number (in paretheses) is a corresponding fraction in the total database. 
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Table 4 Virtual screening functions 


Receptor 

DOCK 

FlexX 

ICM 

Trypsine 


mm 

0.75 

Cytochrome P450 ca m 


B 1 

0 

Neuraminidase 


0.33 

0.83 

Carboxypeptidase 


0.33 

0.83 

L-Arabinose 

1 

1 

1 

e-Trombin 


0.66 

0.5 

Thermolysin 


0.5 

0.5 

Pencillopepsin 


0 

Intestinal Fat-Acid 


0 

0.83 

Ribonuclease Ti 


0 

0.67 

Carbonic Anhydrase II 

0.33 

0.17 

0 

Average 

0.30 

0.32 

0.6 



